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Preface 


Progress  in  telecommunications  over  the  past  two  decades  has  been  nothing  short  of  revolution¬ 
ary,  with  communications  taken  for  granted  in  modern  society  to  the  same  extent  as  electricity. 
There  is  therefore  a  persistent  need  for  engineers  who  are  well- versed  in  the  principles  of  commu¬ 
nication  systems.  These  principles  apply  to  communication  between  points  in  space,  as  well  as 
communication  between  points  in  time  (i.e,  storage).  Digital  systems  are  fast  replacing  analog 
systems  in  both  domains.  This  book  has  been  written  in  response  to  the  following  core  question: 
what  is  the  basic  material  that  an  undergraduate  student  with  an  interest  in  communications 
should  learn,  in  order  to  be  well  prepared  for  either  industry  or  graduate  school?  For  example,  a 
number  of  institutions  only  teach  digital  communication,  assuming  that  analog  communication 
is  dead  or  dying.  Is  that  the  right  approach?  From  a  purely  pedagogical  viewpoint,  there  are 
critical  questions  related  to  mathematical  preparation:  how  much  mathematics  must  a  student 
learn  to  become  well-versed  in  system  design,  what  should  be  assumed  as  background,  and  at 
what  point  should  the  mathematics  that  is  not  in  the  background  be  introduced?  Classically, 
students  learn  probability  and  random  processes,  and  then  tackle  communication.  This  does  not 
quite  work  today:  students  increasingly  (and  I  believe,  rightly)  question  the  applicability  of  the 
material  they  learn,  and  are  less  interested  in  abstraction  for  its  own  sake.  On  the  other  hand, 
I  have  found  from  my  own  teaching  experience  that  students  get  truly  excited  about  abstract 
concepts  when  they  discover  their  power  in  applications,  and  it  is  possible  to  provide  the  means 
for  such  discovery  using  software  packages  such  as  Matlab.  Thus,  we  have  the  opportunity  to 
get  a  new  generation  of  students  excited  about  this  held:  by  covering  abstractions  “just  in  time” 
to  shed  light  on  engineering  design,  and  by  reinforcing  concepts  immediately  using  software  ex¬ 
periments  in  addition  to  conventional  pen-and-paper  problem  solving,  we  can  remove  the  lag 
between  learning  and  application,  and  ensure  that  the  concepts  stick. 

This  textbook  represents  my  attempt  to  act  upon  the  preceding  observations,  and  is  an  out¬ 
growth  of  my  lectures  for  a  two-course  undergraduate  elective  sequence  on  communication  at 
UCSB,  which  is  often  also  taken  by  some  beginning  graduate  students.  Thus,  it  can  be  used  as 
the  basis  for  a  two  course  sequence  in  communication  systems,  or  a  single  course  on  digital  com¬ 
munication,  at  the  undergraduate  or  beginning  graduate  level.  The  book  also  provides  a  review 
or  introduction  to  communication  systems  for  practitioners,  easing  the  path  to  study  of  more 
advanced  graduate  texts  and  the  research  literature.  The  prerequisite  is  a  course  on  signals  and 
systems,  together  with  an  introductory  course  on  probability.  The  required  material  on  random 
processes  is  included  in  the  text. 

A  student  who  masters  the  material  here  should  be  well-prepared  for  either  graduate  school 
or  the  telecommunications  industry.  The  student  should  leave  with  an  understanding  of  base¬ 
band  and  passband  signals  and  channels,  modulation  formats  appropriate  for  these  channels, 
random  processes  and  noise,  a  systematic  framework  for  optimum  demodulation  based  on  signal 
space  concepts,  performance  analysis  and  power-bandwidth  tradeoffs  for  common  modulation 
schemes,  introduction  to  communication  techniques  over  dispersive  channels,  and  a  hint  of  the 
power  of  information  theory  and  channel  coding.  Given  the  signihcant  ongoing  research  and 
development  activity  in  wireless  communication,  and  the  fact  that  an  understanding  of  wireless 
link  design  provides  a  sound  background  for  approaching  other  communication  links,  material 
enabling  hands-on  discovery  of  key  concepts  for  wireless  system  design  is  interspersed  throughout 


9 


the  textbook. 


The  goal  of  the  lecture-style  exposition  in  this  book  is  to  clearly  articulate  a  selection  of  concepts 
that  I  deem  fundamental  to  communication  system  design,  rather  than  to  provide  comprehensive 
coverage.  “Just  in  time”  coverage  is  provided  by  organizing  and  limiting  the  material  so  that  we 
get  to  core  concepts  and  applications  as  quickly  as  possible,  and  by  sometimes  asking  the  reader 
to  operate  with  partial  information  (which  is,  of  course,  standard  operating  procedure  in  the  real 
world  of  engineering  design). 


Organization 

•  Chapter  1  provides  a  perspective  on  communication  systems,  including  a  discussion  of  the 
transition  from  analog  to  digital  communication  and  how  it  colors  the  selection  of  material  in 
this  text.  Chapter  2  provides  a  review  of  signals  and  systems  (biased  towards  communications 
applications),  and  then  discusses  the  complex  baseband  representation  of  passband  signals  and 
systems,  emphasizing  its  critical  role  in  modeling,  design  and  implementation.  A  software  lab 
on  modeling  and  undoing  phase  offsets  in  complex  baseband,  while  providing  a  sneak  preview  of 
digital  modulation,  is  included. 

•  Chapter  2  also  includes  a  section  on  wireless  channel  modeling  in  complex  baseband  using  ray 
tracing,  reinforced  by  a  software  lab  which  applies  these  ideas  to  simulate  link  time  variations 
for  a  lamppost  based  broadband  wireless  network. 

•  Chapter  3  covers  analog  communication  techniques  which  are  relevant  even  as  the  world  goes 
digital,  including  superheterodyne  reception  and  phase  locked  loops.  Legacy  analog  modulation 
techniques  are  discussed  to  illustrate  core  concepts,  as  well  as  in  recognition  of  the  fact  that 
suboptimal  analog  techniques  such  as  envelope  detection  and  limiter-discriminator  detection 
may  have  to  be  resurrected  as  we  push  the  limits  of  digital  communication  in  terms  of  speed  and 
power  consumption. 

•  Chapter  4  discusses  digital  modulation,  including  linear  modulation  using  constellations  such 
as  Pulse  Amplitude  Modulation  (PAM),  Quadrature  Amplitude  Modulation  (QAM),  and  Phase 
Shift  Keying  (PSK),  and  orthogonal  modulation  and  its  variants.  The  chapter  includes  discussion 
of  the  number  of  degrees  of  freedom  available  on  a  bandlimited  channel,  the  Nyquist  criterion 
for  avoidance  of  intersymbol  interference,  and  typical  choices  of  Nyquist  and  square  root  Nyquist 
signaling  pulses.  We  also  provide  a  sneak  preview  of  power-bandwidth  tradeoffs  (with  detailed 
discussion  postponed  until  the  effect  of  noise  has  been  modeled  in  Chapters  5  and  6).  A  software 
lab  providing  a  hands-on  feel  for  Nyquist  signaling  is  included  in  this  chapter. 

The  material  in  Chapters  2  through  4  requires  only  a  background  in  signals  and  systems. 

•  Chapter  5  provides  a  review  of  basic  probability  and  random  variables,  and  then  introduces 
random  processes.  This  chapter  provides  detailed  discussion  of  Gaussian  random  variables,  vec¬ 
tors  and  processes;  this  is  essential  for  modeling  noise  in  communication  systems.  Examples 
which  provide  a  preview  of  receiver  operations  in  communication  systems,  and  computation  of 
performance  measures  such  as  error  probability  and  signal-to-noise  ratio  (SNR),  are  provided. 
Discussion  of  circular  symmetry  of  white  noise,  and  noise  analysis  of  analog  modulation  tech¬ 
niques  is  placed  in  an  appendix,  since  this  is  material  that  is  often  skipped  in  modern  courses  on 
communication  systems. 

•  Chapter  6  covers  classical  material  on  optimum  demodulation  for  M-ary  signaling  in  the  pres¬ 
ence  of  additive  white  Gaussian  noise  (AWGN).  The  background  on  Gaussian  random  variables, 
vectors  and  processes  developed  in  Ghapter  5  is  applied  to  derive  optimal  receivers,  and  to  analyze 
their  performance.  After  discussing  error  probability  computation  as  a  function  of  SNR,  we  are 
able  to  combine  the  materials  in  Ghapters  4  and  6  for  a  detailed  discussion  of  power-bandwidth 
tradeoffs.  Ghapter  6  concludes  with  an  introduction  to  link  budget  analysis,  which  provides 
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guidelines  on  the  choice  of  physical  link  parameters  such  as  transmit  and  receive  antenna  gains, 
and  distance  between  transmitter  and  receiver,  using  what  we  know  about  the  dependence  of 
error  probability  as  a  function  of  SNR.  This  chapter  includes  a  software  lab  which  builds  on  the 
Nyquist  signaling  lab  in  Chapter  4  by  investigating  the  effect  of  noise.  It  also  inclndes  another 
software  lab  simulating  performance  over  a  time-varying  wireless  channel,  examining  the  effects 
of  fading  and  diversity,  and  introduces  the  concept  of  differential  demodulation  for  avoidance  of 
explicit  channel  tracking. 

Chapters  2  through  6  provide  a  systematic  lecture-style  exposition  of  what  I  consider  core  con¬ 
cepts  in  communication  at  an  undergraduate  level. 

•  Chapter  7  provides  a  glimpse  of  information  theory  and  coding  whose  goal  is  to  stimulate  the 
reader  to  explore  further  using  more  advanced  resources  such  as  graduate  courses  and  textbooks. 
It  shows  the  critical  role  of  channel  coding,  provides  an  initial  exposnre  to  information-theoretic 
performance  benchmarks,  and  discusses  belief  propagation  in  detail,  reinforcing  the  basic  con¬ 
cepts  through  a  software  lab. 

•  Chapter  8  provides  a  hrst  exposure  to  the  more  advanced  topics  of  communication  over  dis¬ 
persive  channels,  and  of  multiple  antenna  systems,  often  termed  space-time  communication,  or 
Multiple  Input  Multiple  Output  (MIMO)  communication.  These  topics  are  gronped  together  be¬ 
cause  they  use  similar  signal  processing  tools.  We  emphasize  lab-style  “discovery”  in  this  chapter 
using  three  software  labs,  one  on  adaptive  linear  equalization  for  singlecarrier  modulation,  one  on 
basic  OFDM  transceiver  operations,  and  one  on  MIMO  signal  processing  for  space-time  coding 
and  spatial  multiplexing.  The  goal  is  for  students  to  acquire  hands-on  insight  that  hopefully 
motivates  them  to  undertake  a  deeper  and  more  systematic  investigation. 

•  Finally,  the  epilogue  contains  speculation  on  future  directions  in  communications  research  and 
technology.  The  goal  is  to  provide  a  high-level  perspective  on  where  mastery  of  the  introductory 
material  in  this  textbook  could  lead,  and  to  argue  that  the  innovations  that  this  held  has  already 
seen  set  the  stage  for  many  exciting  developments  to  come. 

The  role  of  software:  Software  problems  and  labs  are  integrated  into  the  text,  while  “code  frag¬ 
ments”  implementing  core  functionalities  provided  in  the  text.  While  code  can  be  provided 
online,  separate  from  the  text  (and  indeed,  sample  code  is  made  available  online  for  instruc¬ 
tors),  code  fragments  are  integrated  into  the  text  for  two  reasons.  First,  they  enable  readers  to 
immediately  see  the  software  realization  of  a  key  concept  as  they  read  the  text.  Second,  I  feel 
that  students  would  learn  more  by  putting  in  the  work  of  writing  their  own  code,  building  on 
these  code  fragments  if  they  wish,  rather  than  using  code  that  is  easily  available  online.  The 
particular  software  that  we  use  is  Matlab,  because  of  its  widespread  availability,  and  because  of 
its  importance  in  design  and  performance  evaluation  in  both  academia  and  industry.  However, 
the  code  fragments  can  also  be  viewed  as  “pseudocode,”  and  can  be  easily  implemented  using 
other  software  packages  or  languages.  Block-based  packages  such  as  Simulink  (which  builds  upon 
Matlab)  are  avoided  here,  because  the  use  of  software  here  is  pedagogical  rather  than  aimed  at, 
say,  designing  a  complete  system  by  pntting  together  subsystems  as  one  might  do  in  industry. 


Suggestions  for  using  this  book 

I  view  Chapter  2  (complex  baseband).  Chapter  4  (digital  modulation),  and  Chapter  6  (optimum 
demodulation)  as  core  material  that  must  be  studied  to  understand  the  concepts  underlying 
modern  communication  systems.  Chapter  6  relies  on  the  probability  and  random  processes 
material  in  Chapter  5,  especially  the  material  on  jointly  Gaussian  random  variables  and  WGN, 
but  the  remaining  material  in  Chapter  5  can  be  covered  selectively,  depending  on  the  students’ 
background.  Chapter  3  (analog  communication  techniques)  is  designed  such  that  it  can  be 
completely  skipped  if  one  wishes  to  focus  solely  on  digital  communication.  Finally,  Chapter 
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7  and  Chapter  8  contain  glimpses  of  advanced  material  that  can  be  sampled  according  to  the 
instructor’s  discretion.  The  qualitative  discussion  in  the  epilogue  is  meant  to  provide  the  student 
with  perspective,  and  is  not  intended  for  formal  coverage  in  the  classroom. 

In  my  own  teaching  at  UCSB,  this  material  forms  the  basis  for  a  two-course  sequence,  with 
Chapters  2-4  covered  in  the  hrst  course,  and  Chapters  5-6  covered  in  the  second  course,  with  the 
dispersive  channels  portion  of  Chapter  8  providing  the  basis  for  the  labs  in  the  second  course. 
The  content  of  these  courses  are  constantly  being  revised,  and  it  is  anticipated  that  the  material 
on  channel  coding  and  MIMO  may  displace  some  of  the  existing  material  in  the  future.  UCSB  is 
on  a  quarter  system,  hence  the  coverage  is  fast-paced,  and  many  topics  are  omitted  or  skimmed. 
There  is  ample  material  here  for  a  two-semester  undergraduate  course  sequence.  For  a  single 
one-semester  course,  one  possible  organization  is  to  cover  Chapter  4,  a  selection  of  Chapter  5, 
Chapter  6,  and  if  time  permits.  Chapter  7. 
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Chapter  1 
Introduction 


This  textbook  provides  an  introduction  to  the  conceptual  underpinnings  of  communication  tech¬ 
nologies.  Most  of  us  directly  experience  such  technologies  daily:  browsing  (and  audio/video 
streaming  from)  the  Internet,  sending/receiving  emails,  watching  television,  or  carrying  out  a 
phone  conversation.  Many  of  these  experiences  occur  on  mobile  devices  that  we  carry  around 
with  us,  so  that  we  are  always  connected  to  the  cyberworld  of  modern  communication  systems. 
In  addition,  there  is  a  huge  amount  of  machine-to-machine  communication  that  we  do  not  di¬ 
rectly  experience,  but  which  are  indispensable  for  the  operation  of  modern  society.  Examples 
include  signaling  between  routers  on  the  Internet,  or  between  processors  and  memories  on  any 
computing  device. 

We  define  communication  as  the  process  of  information  transfer  across  space  or  time.  Commu¬ 
nication  across  space  is  something  we  have  an  intuitive  understanding  of:  for  example,  radio 
waves  carry  our  phone  conversation  between  our  cell  phone  and  the  nearest  base  station,  and 
coaxial  cables  (or  optical  hber,  or  radio  waves  from  a  satellite)  deliver  television  from  a  remote 
location  to  our  home.  However,  a  moment’s  thought  shows  that  that  communication  across  time, 
or  storage  of  information,  is  also  an  everyday  experience,  given  our  use  of  storage  media  such  as 
compact  discs  (CDs),  digital  video  discs  (DVDs),  hard  drives  and  memory  sticks.  In  all  of  these 
instances,  the  key  steps  in  the  operation  of  a  communication  link  are  as  follows: 

(a)  insertion  of  information  into  a  signal,  termed  the  transmitted  signal,  compatible  with  the 
physical  medium  of  interest. 

(b)  propagation  of  the  signal  through  the  physical  medium  (termed  the  channel)  in  space  or 
time; 

(c)  extraction  of  information  from  the  signal  (termed  the  received  signal)  obtained  after  propa¬ 
gation  through  the  medium. 

In  this  book,  we  study  the  fundamentals  of  modeling  and  design  for  these  steps. 

Chapter  Plan:  In  Section  1.1,  we  provide  a  high-level  description  of  analog  and  digital  com¬ 
munication  systems,  and  discuss  why  digital  communication  is  the  inevitable  design  choice  in 
modern  systems.  In  Section  1.2,  we  briefly  provide  a  technological  perspective  on  recent  devel¬ 
opments  in  communication.  We  do  not  attempt  to  provide  a  comprehensive  discussion  of  the 
fascinating  history  of  communication:  thanks  to  the  advances  in  communication  that  brought  us 
the  Internet,  it  is  easy  to  look  it  up  online!  A  discussion  of  the  scope  of  this  textbook  is  provided 
in  Section  1.3. 


1.1  Analog  or  Digital? 

Even  without  defining  information  formally,  we  intuitively  understand  that  speech,  audio,  and 
video  signals  contain  information.  We  use  the  term  message  signals  for  such  signals,  since  these 
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are  the  messages  we  wish  to  convey  over  a  communication  system.  In  their  original  form- 
both  during  generation  and  consumption-these  message  signals  are  analog:  they  are  continuous 
time  signals,  with  the  signal  values  also  lying  in  a  continuum.  When  someone  plays  the  violin, 
an  analog  acoustic  signal  is  generated  (often  translated  to  an  analog  electrical  signal  using  a 
microphone).  Even  when  this  music  is  recorded  onto  a  digital  storage  medium  such  as  a  CD  (using 
the  digital  communication  framework  outlined  in  Section  1.1.2),  when  we  ultimately  listen  to  the 
CD  being  played  on  an  audio  system,  we  hear  an  analog  acoustic  signal.  The  transmitted  signals 
corresponding  to  physical  communication  media  are  also  analog.  For  example,  in  both  wireless 
and  optical  communication,  we  employ  electromagnetic  waves,  which  correspond  to  continuous 
time  electric  and  magnetic  helds  taking  values  in  a  continuum. 


1.1.1  Analog  communication 


Message  Transmitted  Received  Message 

signal  signal  signal  signal 


Figure  1.1;  Block  diagram  for  an  analog  communication  system.  The  modulator  transforms 
the  message  signal  into  the  transmitted  signal.  The  channel  distorts  and  adds  noise  to  the 
transmitted  signal.  The  demodulator  extracts  an  estimate  of  the  message  signal  from  the  received 
signal  arriving  from  the  channel. 


Given  the  analog  nature  of  both  the  message  signal  and  the  communication  medium,  a  natural 
design  choice  is  to  map  the  analog  message  signal  (e.g.,  an  audio  signal,  translated  from  the 
acoustic  to  electrical  domain  using  a  microphone)  to  an  analog  transmitted  signal  (e.g.,  a  radio 
wave  carrying  the  audio  signal)  that  is  compatible  with  the  physical  medium  over  which  we  wish 
to  communicate  (e.g.,  broadcasting  audio  over  the  air  from  an  FM  radio  station).  This  approach 
to  communication  system  design,  depicted  in  Figure  1.1,  is  termed  analog  communication.  Early 
communication  systems  were  all  analog:  examples  include  AM  (amplitude  modulation)  and  FM 
(frequency  modulation)  radio,  analog  television.  Erst  generation  cellular  phone  technology  (based 
on  FM),  vinyl  records,  audio  cassettes,  and  VHS  or  beta  videocassettes 

While  analog  communication  might  seem  like  the  most  natural  option,  it  is  in  fact  obsolete.  Cel¬ 
lular  phone  technologies  from  the  second  generation  onwards  are  digital,  vinyl  records  and  audio 
cassettes  have  been  supplanted  by  CDs,  and  videocassettes  by  DVDs.  Broadcast  technologies 
such  as  radio  and  television  are  often  slower  to  upgrade  because  of  economic  and  political  factors, 
but  digital  broadcast  radio  and  television  technologies  are  either  replacing  or  sidestepping  (e.g., 
via  satellite)  analog  FM/AM  radio  and  television  broadcast.  Let  us  now  dehne  what  we  mean  by 
digital  communication,  before  discussing  the  reasons  for  the  inexorable  trend  away  from  analog 
and  towards  digital  communication. 


1.1.2  Digital  communication 

The  conceptual  basis  for  digital  communication  was  established  in  1948  by  Claude  Shannon, 
when  he  founded  the  held  of  information  theory.  There  are  two  main  threads  to  this  theory: 

•  Source  coding  and  compression:  Any  information-bearing  signal  can  be  represented  ef¬ 
ficiently,  to  within  a  desired  accuracy  of  reproduction,  by  a  digital  signal  (i.e.,  a  discrete  time 
signal  taking  values  from  a  discrete  set),  which  in  its  simplest  form  is  just  a  sequence  of  binary 
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digits  (zeros  or  ones),  or  bits.  This  is  true  whether  the  information  source  is  text,  speech,  au¬ 
dio  or  video.  Techniques  for  performing  the  mapping  from  the  original  source  signal  to  a  bit 
sequence  are  generically  termed  source  coding.  They  often  involve  compression,  or  removal  of 
redundancy,  in  a  manner  that  exploits  the  properties  of  the  source  signal  (e.g.,  the  heavy  spatial 
correlation  among  adjacent  pixels  in  an  image  can  be  exploited  to  represent  it  more  efficiently 
than  a  pixel- by-pixel  representation). 

•  Digital  information  transfer:  Once  the  source  encoding  is  done,  our  communication  task  re¬ 
duces  to  reliably  transferring  the  bit  sequence  at  the  output  of  the  source  encoder  across  space  or 
time,  without  worrying  about  the  original  source  and  the  sophisticated  tricks  that  have  been  used 
to  encode  it.  The  performance  of  any  communication  system  depends  on  the  relative  strengths 
of  the  signal  and  noise  or  interference,  and  the  distortions  imposed  by  the  channel.  Shannon 
showed  that,  once  we  fix  these  operational  parameters  for  any  communication  channel,  there 
exists  a  maximum  possible  rate  of  reliable  communication,  termed  the  channel  capacity.  Thus, 
given  the  information  bits  at  the  output  of  the  source  encoder,  in  principle,  we  can  transmit  them 
reliably  over  a  given  link  as  long  as  the  information  rate  is  smaller  than  the  channel  capacity, 
and  we  cannot  transmit  them  reliably  if  the  information  rate  is  larger  than  the  channel  capac¬ 
ity.  This  sharp  transition  between  reliable  and  unreliable  communication  differs  fundamentally 
from  analog  communication,  where  the  quality  of  the  reproduced  source  signal  typically  degrades 
gradually  as  the  channel  conditions  get  worse. 

A  block  diagram  for  a  typical  digital  communication  system  based  on  these  two  threads  is  shown 
in  Figure  1.2.  We  now  briefly  describe  the  role  of  each  component,  together  with  simplified 
examples  of  its  function. 


Information 

bits 


Message 

signal 


Message 

signal 


Figure  1.2:  Components  of  a  digital  communication  system. 


Source  encoder:  As  already  discussed,  the  source  encoder  converts  the  message  signal  into  a 
sequence  of  information  bits.  The  information  bit  rate  depends  on  the  nature  of  the  message 
signal  (e.g.,  speech,  audio,  video)  and  the  application  requirements.  Even  when  we  fix  the  class 
of  message  signals,  the  choice  of  source  encoder  is  heavily  dependent  on  the  setting.  For  example, 
video  signals  are  heavily  compressed  when  they  are  sent  over  a  cellular  link  to  a  mobile  device, 
but  are  lightly  compressed  when  sent  to  an  high  dehnition  television  (HDTV)  set.  A  cellular  link 
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can  support  a  much  smaller  bit  rate  than,  say,  the  cable  connecting  a  DVD  player  to  an  HDTV 
set,  and  a  smaller  mobile  display  device  requires  lower  resolution  than  a  large  HDTV  screen.  In 
general,  the  source  encoder  must  be  chosen  such  that  the  bit  rate  it  generates  can  be  supported 
by  the  digital  communication  link  we  wish  to  transfer  information  over.  Other  than  this,  source 
coding  can  be  decoupled  entirely  from  link  design  (we  comment  further  on  this  a  bit  later). 
Example:  A  laptop  display  may  have  resolution  1024  x  768  pixels.  For  a  grayscale  digital  image, 
the  intensity  for  each  pixel  might  be  represented  by  8  bits.  Multiplying  by  the  number  of 
pixels  gives  us  about  6.3  million  bits,  or  about  0.8  Mbyte  (a  byte  equals  8  bits).  However, 
for  a  typical  image,  the  intensities  for  neighboring  pixels  are  heavily  correlated,  which  can  be 
exploited  for  signihcantly  reducing  the  number  of  bits  required  to  represent  the  image,  without 
noticeably  distorting  it.  For  example,  one  could  take  a  two-dimensional  Fourier  transform,  which 
concentrates  most  of  the  information  in  the  image  at  lower  frequencies  and  then  discard  many 
of  the  high  frequency  coefficients.  There  are  other  possible  transforms  one  could  use,  and  also 
several  more  processing  stages,  but  the  bottomline  is  that,  for  natural  images,  state  of  the  art 
image  compression  algorithms  can  provide  lOX  compression  (i.e.,  reduction  in  the  number  of  bits 
relative  to  the  original  uncompressed  digital  image)  with  hardly  any  perceptual  degradation.  Far 
more  aggressive  compression  ratios  are  possible  if  we  are  willing  to  tolerate  more  distortion.  For 
video,  in  addition  to  the  spatial  correlation  exploited  for  image  compression,  we  can  also  exploit 
temporal  correlation  across  successive  frames. 

Channel  encoder:  The  channel  encoder  adds  redundancy  to  the  information  bits  obtained 
from  the  source  encoder,  in  order  to  facilitate  error  recovery  after  transmission  over  the  channel. 
It  might  appear  that  we  are  putting  in  too  much  work,  adding  redundancy  just  after  the  source 
encoder  has  removed  it.  However,  the  redundancy  added  by  the  channel  encoder  is  tailored  to 
the  channel  over  which  information  transfer  is  to  occur,  whereas  the  redundancy  in  the  original 
message  signal  is  beyond  our  control,  so  that  it  would  be  inefficient  to  keep  it  when  we  transmit 
the  signal  over  the  channel. 

Example:  The  noise  and  distortion  introduced  by  the  channel  can  cause  errors  in  the  bits  we 
send  over  it.  Consider  the  following  abstraction  for  a  channel:  we  can  send  a  string  of  bits  (zeros 
or  ones)  over  it,  and  the  channel  randomly  flips  each  bit  with  probability  0.01  (i.e.,  the  channel 
has  a  1%  error  rate).  If  we  cannot  tolerate  this  error  rate,  we  could  repeat  each  bit  that  we  wish 
to  send  three  times,  and  use  a  majority  rule  to  decide  on  its  value.  Now,  we  only  make  an  error 
if  two  or  more  of  the  three  bits  are  flipped  by  the  channel.  It  is  left  as  an  exercise  to  calculate 
that  an  error  now  happens  with  probability  approximately  0.0003  (i.e.,  the  error  rate  has  gone 
down  to  0.03%).  That  is,  we  have  improved  performance  by  introducing  redundancy.  Of  course, 
there  far  more  sophisticated  and  efficient  techniques  for  introducing  redundancy  than  the  simple 
repetition  strategy  just  described;  see  Chapter  7. 

Modulator:  The  modulator  maps  the  coded  bits  at  the  output  of  the  channel  encoder  to  a 
transmitted  signal  to  be  sent  over  the  channel.  For  example,  we  may  insist  that  the  transmitted 
signal  ht  within  a  given  frequency  band  and  adhere  to  stringent  power  constraints  in  a  wireless 
system,  where  interference  between  users  and  between  co-existing  systems  is  a  major  concern. 
Unlicensed  WiFi  transmissions  typically  occupy  20-40  MHz  of  bandwidth  in  the  2.4  or  5  GHz 
bands.  Transmissions  in  fourth  generation  cellular  systems  may  often  occupy  bandwidths  ranging 
from  1-20  MHz  at  frequencies  ranging  from  700  MHz  to  3  GHz.  While  these  signal  bandwidths 
are  being  increased  in  an  effort  to  increase  data  rates  (e.g.,  up  to  160  GHz  for  emerging  WiFi 
standards,  and  up  to  100  MHz  for  emerging  cellular  standards),  and  new  frequency  bands  are 
being  actively  explored  (see  the  epilogue  for  more  discussion),  the  transmitted  signal  still  needs 
to  be  shaped  to  £t  within  certain  spectral  constraints. 

Example:  Suppose  that  we  send  bit  value  0  by  transmitting  the  signal  s(t),  and  bit  value  1  by 
transmitting  —s{t).  Even  for  this  simple  example,  we  must  design  the  signal  s{t)  so  it  hts  within 
spectral  constraints  (e.g.,  two  different  users  may  use  two  different  segments  of  spectrum  to  avoid 
interfering  with  each  other),  and  we  must  hgure  out  how  to  prevent  successive  bits  of  the  same 
user  from  interfering  with  each  other.  For  wireless  communication,  these  signals  are  voltages 
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generated  by  circuits  coupled  to  antennas,  and  are  ultimately  emitted  as  electromagnetic  waves 
from  the  antennas. 

The  channel  encoder  and  modulator  are  typically  jointly  designed,  keeping  in  mind  the  antici¬ 
pated  channel  conditions,  and  the  result  is  termed  a  coded  modulator. 

Channel:  The  channel  distorts  and  adds  noise,  and  possibly  interference,  to  the  transmitted  sig¬ 
nal.  Much  of  our  success  in  developing  communication  technologies  has  resulted  from  being  able 
to  optimize  communication  strategies  based  on  accurate  mathematical  models  for  the  channel. 
Such  models  are  typically  statistical,  and  are  developed  with  signihcant  effort  using  a  combi¬ 
nation  of  measurement  and  computation.  The  physical  characteristics  of  the  communication 
medium  vary  widely,  and  hence  so  do  the  channel  models.  Wireline  channels  are  typically  well 
modeled  as  linear  and  time-invariant,  while  optical  hber  channels  exhibit  nonlinearities.  Wireless 
mobile  channels  are  particularly  challenging  because  of  the  time  variations  caused  by  mobility, 
and  due  to  the  potential  for  interference  due  to  the  broadcast  nature  of  the  medium.  The  link 
design  also  depends  on  system-level  characteristics,  such  as  whether  or  not  the  transmitter  has 
feedback  regarding  the  channel,  and  what  strategy  is  used  to  manage  interference. 

Example:  Consider  communication  between  a  cellular  base  station  and  a  mobile  device.  The  elec¬ 
tromagnetic  waves  emitted  by  the  base  station  can  reach  the  mobile’s  antennas  through  multiple 
paths,  including  bounces  off  streets  and  building  surfaces.  The  received  signal  at  the  mobile  can 
be  modeled  as  multiple  copies  of  the  transmitted  signal  with  different  gains  and  delays.  These 
gains  and  delays  change  due  to  mobility,  but  the  rate  of  change  is  often  slow  compared  to  the 
data  rate,  hence  over  short  intervals,  we  can  get  away  with  modeling  the  channel  as  a  linear 
time-invariant  system  that  the  transmitted  signal  goes  through  before  arriving  at  the  receiver. 

Demodulator:  The  demodulator  processes  the  signal  received  from  the  channel  to  produce  bit 
estimates  to  be  fed  to  the  channel  decoder.  It  typically  performs  a  number  of  signal  processing 
tasks,  such  as  synchronization  of  phase,  frequency  and  timing,  and  compensating  for  distortions 
induced  by  the  channel. 

Example:  Consider  the  simplest  possible  channel  model,  where  the  channel  just  adds  noise  to 
the  transmitted  signal.  In  our  earlier  example  of  sending  ±s(t)  to  send  0  or  1,  the  demodulator 
must  guess,  based  on  the  noisy  received  signal,  which  of  these  two  options  is  true.  It  might 
make  a  hard  decision  (e.g.,  say  that  it  guess  that  0  was  sent),  or  hedge  its  bets,  and  make  a  soft 
decision,  saying,  for  example,  that  it  is  80%  sure  that  the  transmitted  bit  is  a  zero.  There  are 
a  host  of  other  functions  that  we  have  swept  under  the  rug:  before  making  any  decisions,  the 
demodulator  has  to  perform  functions  such  as  synchronization  (making  sure  that  the  receiver’s 
notion  of  time  and  frequency  is  consistent  with  the  transmitter’s)  and  equalization  (compensating 
for  the  distortions  due  to  the  channel). 

Channel  decoder:  The  channel  decoder  processes  the  imperfect  bit  estimates  provided  by 
the  demodulator,  and  exploits  the  controlled  redundancy  introduced  by  the  channel  encoder  to 
estimate  the  information  bits. 

Example:  The  channel  decoder  takes  the  guesses  from  the  demodulator  and  uses  the  redundancies 
in  the  channel  code  to  clean  up  the  decisions.  In  our  simple  example  of  repeating  every  bit  three 
times,  it  might  use  a  majority  rule  to  make  its  hnal  decision  if  the  demodulator  is  putting  out 
hard  decisions.  For  soft  decisions,  it  might  use  more  sophisticated  combining  rules  with  improved 
performance. 

While  we  have  described  the  demodulator  and  decoder  as  operating  separately  and  in  sequence 
for  simplicity,  there  can  be  signihcant  benehts  from  iterative  information  exchange  between  the 
two.  In  addition,  for  certain  coded  modulation  strategies  in  which  channel  coding  and  modulation 
are  tightly  coupled,  the  demodulator  and  channel  decoder  may  be  integrated  into  a  single  entity. 

Source  decoder:  The  source  decoder  processes  the  estimated  information  bits  at  the  output 
of  the  channel  decoder  to  obtain  an  estimate  of  the  message.  The  message  format  may  or  may 
not  be  the  same  as  that  of  the  original  message  input  to  the  source  encoder:  for  example,  the 
source  encoder  may  translate  speech  to  text  before  encoding  into  bits,  and  the  source  decoder 
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may  output  a  text  message  to  the  end  user. 

Example:  For  the  example  of  a  digital  image  considered  earlier,  the  compressed  image  can  be 
translated  back  to  a  pixel-by-pixel  representation  by  taking  the  inverse  spatial  Fourier  transform 
of  the  coefficients  that  survived  the  compression. 

We  are  now  ready  to  compare  analog  and  digital  communication,  and  discuss  why  the  trend 
towards  digital  is  inevitable. 


1.1.3  Why  digital? 

Comparing  the  block  diagrams  for  analog  and  digital  communication  in  Figures  1.1  and  1.2, 
respectively,  we  see  that  the  digital  communication  system  involves  far  more  processing.  How¬ 
ever,  this  is  not  an  obstacle  for  modern  transceiver  design,  due  to  the  exponential  increase  in 
the  computational  power  of  low-cost  silicon  integrated  circuits.  Digital  communication  has  the 
following  key  advantages. 

Optimality:  For  a  point-to-point  link,  it  is  optimal  to  separately  optimize  source  coding  and 
channel  coding,  as  long  we  do  not  mind  the  delay  and  processing  incurred  in  doing  so.  Due 
to  this  source- channel  separation  principle,  we  can  leverage  the  best  available  source  codes  and 
the  best  available  channel  codes  in  designing  a  digital  communication  system,  independently 
of  each  other.  Efficient  source  encoders  must  be  highly  specialized.  For  example,  state  of  the 
art  speech  encoders,  video  compression  algorithms,  or  text  compression  algorithms  are  very 
different  from  each  other,  and  are  each  the  result  of  signihcant  effort  over  many  years  by  a  large 
community  of  researchers.  However,  once  source  encoding  is  performed,  the  coded  modulation 
scheme  used  over  the  communication  link  can  be  engineered  to  transmit  the  information  bits 
reliably,  regardless  of  what  kind  of  source  they  correspond  to,  with  the  bit  rate  limited  only 
by  the  channel  and  transceiver  characteristics.  Thus,  the  design  of  a  digital  communication 
link  is  source-independent  and  channel- optimized.  In  contrast,  the  waveform  transmitted  in  an 
analog  communication  system  depends  on  the  message  signal,  which  is  beyond  the  control  of  the 
link  designer,  hence  we  do  not  have  the  freedom  to  optimize  link  performance  over  all  possible 
communication  schemes.  This  is  not  just  a  theoretical  observation:  in  practice,  huge  performance 
gains  are  obtained  from  switching  from  analog  to  digital  communication. 

Scalability:  While  Figure  1.2  shows  a  single  digital  communication  link  between  source  en¬ 
coder  and  decoder,  under  the  source-channel  separation  principle,  there  is  nothing  preventing 
us  from  inserting  additional  links,  putting  the  source  encoder  and  decoder  at  the  end  points. 
This  is  because  digital  communication  allows  ideal  regeneration  of  the  information  bits,  hence 
every  time  we  add  a  link,  we  can  focus  on  communicating  reliably  over  that  particular  link.  (Of 
course,  information  bits  do  not  always  get  through  reliably,  hence  we  typically  add  error  recovery 
mechanisms  such  as  retransmission,  at  the  level  of  an  individual  link  or  end-to-end.)  Another 
consequence  of  the  source-channel  separation  principle  is  that,  since  information  bits  are  trans¬ 
ported  without  interpretation,  the  same  link  can  be  used  to  carry  multiple  kinds  of  messages.  A 
particularly  useful  approach  is  to  chop  the  information  bits  up  into  discrete  chunks,  of  packets, 
which  can  then  be  processed  independently  on  each  link.  These  properties  of  digital  communica¬ 
tion  are  critical  for  enabling  massively  scalable,  general  purpose,  communication  networks  such 
as  the  Internet.  Such  networks  can  have  large  numbers  of  digital  communication  links,  possibly 
with  different  characteristics,  independently  engineered  to  provide  bit  pipes  that  can  support 
data  rates.  Messages  of  various  kinds,  after  source  encoding,  are  reduced  to  packets,  and  these 
packets  are  switched  along  different  paths  along  the  network,  depending  on  the  identities  of  the 
source  and  destination  nodes,  and  the  loads  on  different  links  in  the  network.  None  of  this  would 
be  possible  with  analog  communication:  link  performance  in  an  analog  communication  system 
depends  on  message  properties,  and  successive  links  incur  noise  accumulation,  which  limits  the 
number  of  links  which  can  be  cascaded. 
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The  preceding  makes  it  clear  that  source-channel  separation  is  crucial  in  the  formation  and  growth 
of  modern  communication  networks.  It  is  worth  noting,  however,  that  joint  source-channel  design 
can  provide  better  performance  in  some  settings,  especially  when  there  are  constraints  on  delay 
or  complexity,  or  if  multiple  users  are  being  supported  simultaneously  on  a  given  communication 
medium.  In  practice,  this  means  that  “local”  violations  of  the  separation  principle  (e.g.,  over  a 
wireless  last  hop  in  a  communication  network)  may  be  a  useful  design  trick. 


1.2  A  Technology  Perspective 

We  now  discuss  some  technology  trends  and  concepts  that  have  driven  the  astonishing  growth 
in  communication  systems  in  the  past  two  decades,  and  that  are  expected  to  impact  future 
developments  in  this  area.  Our  discussion  is  structured  in  terms  of  big  technology  “stories.” 

Technology  story  1:  The  Internet.  Some  of  the  key  ingredients  that  contributed  to  its 
growth  and  the  essential  role  it  plays  in  our  lives  are  as  follows: 

•  Any  kind  of  message  can  be  chopped  up  into  packets  and  routed  across  the  network,  using  an 
Internet  Protocol  (IP)  that  is  simple  to  implement  in  software; 

•  Advances  in  optical  hber  communication  and  high-speed  digital  hardware  enable  a  super-fast 
“core”  of  routers  connected  by  very  high-speed,  long-range  links,  that  enable  world-wide  coverage; 

•  The  World  Wide  Web,  or  web,  makes  it  easy  to  organize  information  into  interlinked  hypertext 
documents  which  can  be  browsed  from  anywhere  in  the  world; 

•  The  digitization  of  content  (audio,  video,  books)  means  that  ultimately  “all”  information  is 
expected  to  be  available  on  the  web; 

•  Search  engines  enable  us  to  efficiently  search  for  this  information; 

•  Connectivity  applications  such  as  email,  teleconferencing,  videoconferencing  and  online  social 
networks  have  become  indispensable  in  our  daily  lives. 

Technology  story  2:  Wireless.  Cellular  mobile  networks  are  everywhere,  and  are  based  on 
the  breakthrough  concept  that  ubiquitous  tetherless  connectivity  can  be  provided  by  breaking 
the  world  into  cells,  with  “spatial  reuse”  of  precious  spectrum  resources  in  cells  that  are  “far 
enough”  apart.  Base  stations  serve  mobiles  in  their  cells,  and  hand  them  off  to  adjacent  base 
stations  when  the  mobile  moves  to  another  cell.  While  cellular  networks  were  invented  to  support 
voice  calls  for  mobile  users,  today’s  mobile  devices  (e.g.,  “smart  phones”  and  tablet  computers) 
are  actually  powerful  computers  with  displays  large  enough  for  users  to  consume  video  on  the  go. 
Thus,  cellular  networks  must  now  support  seamless  access  to  the  Internet.  The  billions  of  mobile 
devices  in  use  easily  outnumber  desktop  and  laptop  computers,  so  that  the  most  important  parts 
of  the  Internet  today  are  arguably  the  cellular  networks  at  its  edge.  Mobile  service  providers  are 
having  great  difficulty  keeping  up  with  the  increase  in  demand  resulting  from  this  convergence 
of  cellular  and  Internet;  by  some  estimates,  the  capacity  of  cellular  networks  must  be  scaled  up 
by  several  orders  of  magnitude,  at  least  in  densely  populated  urban  areas!  As  discussed  in  the 
epilogue,  a  major  challenge  for  the  communication  researcher  and  technologist,  therefore,  is  to 
come  up  with  the  breakthroughs  required  to  deliver  such  capacity  gains. 

Another  major  success  in  wireless  is  WiFi,  a  catchy  term  for  a  class  of  standardized  wireless 
local  area  network  (WLAN)  technologies  based  on  the  IEEE  802.11  family  of  standards.  Cur¬ 
rently,  WiFi  networks  use  unlicensed  spectrum  in  the  2.4  and  5  GHz  bands,  and  have  come  into 
widespread  use  in  both  residential  and  commercial  environments.  WiFi  transceivers  are  now 
incorporated  into  almost  every  computer  and  mobile  device.  One  way  of  alleviating  the  cellular 
capacity  crunch  that  was  just  mentioned  is  to  offload  Internet  access  to  the  nearest  WiFi  net¬ 
work.  Of  course,  since  different  WiFi  networks  are  often  controlled  by  different  entities,  seamless 
switching  between  cellular  and  WiFi  is  not  always  possible. 

It  is  instructive  to  devote  some  thought  to  the  contrast  between  cellular  and  WiFi  technologies. 
Cellular  transceivers  and  networks  are  far  more  tightly  engineered.  They  employ  spectrum  that 
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mobile  operators  pay  a  great  deal  of  money  to  license,  hence  it  is  critical  to  use  this  spectrum 
efficiently.  Furthermore,  cellular  networks  must  provide  robust  wide-area  coverage  in  the  face  of 
rapid  mobility  (e.g.,  automobiles  at  highway  speeds).  In  contrast,  WiFi  uses  unlicensed  (i.e.,  free!) 
spectrum,  must  only  provide  local  coverage,  and  typically  handles  much  slower  mobility  (e.g., 
pedestrian  motion  through  a  home  or  building).  As  a  result,  WiFi  can  be  more  loosely  engineered 
than  cellular.  It  is  interesting  to  note  that  despite  the  deployment  of  many  uncoordinated 
WiFi  networks  in  an  unlicensed  setting,  WiFi  typically  provides  acceptable  performance,  partly 
because  the  relatively  large  amount  of  unlicensed  spectrum  (especially  in  the  5  GHz  band)  allows 
for  channel  switching  when  encountering  excessive  interference,  and  partly  because  of  naturally 
occurring  spatial  reuse  (WiFi  networks  that  are  “far  enough”  do  not  interfere  with  each  other). 
Of  course,  in  densely  populated  urban  environments  with  many  independently  deployed  WiFi 
networks,  the  performance  can  deteriorate  significantly,  a  phenomenon  sometimes  referred  to  as 
a  tragedy  of  the  commons  (individually  selfish  behavior  leading  to  poor  utilization  of  a  shared 
resource).  As  we  briefly  discuss  in  the  epilogue,  both  the  cellular  and  WiFi  design  paradigms 
need  to  evolve  to  meet  our  future  needs. 

Technology  story  3:  Moore’s  law.  Moore’s  “law”  is  actually  an  empirical  observation  at¬ 
tributed  to  Gordon  Moore,  one  of  the  co-founders  of  Intel  Corporation.  It  can  be  paraphrased  as 
saying  that  the  density  of  transistors  in  an  integrated  circuit,  and  hence  the  amount  of  compu¬ 
tation  per  unit  cost,  can  be  expected  to  increase  exponentially  over  time.  This  observation  has 
become  a  self-fulfilling  prophecy,  because  it  has  been  taken  up  by  the  semiconductor  industry 
as  a  growth  benchmark  driving  their  technology  roadmap.  While  Moore’s  law  may  be  slowing 
down  somewhat,  it  has  already  had  a  spectacular  impact  on  the  communications  industry  by 
drastically  lowering  the  cost  and  increasing  the  speed  of  digital  computation.  By  converting 
analog  signals  to  the  digital  domain  as  soon  as  possible,  advanced  transceiver  algorithms  can 
be  implemented  in  digital  signal  processing  (DSP)  using  low-cost  integrated  circuits,  so  that  re¬ 
search  breakthroughs  in  coding  and  modulation  can  be  quickly  transitioned  into  products.  This 
leads  to  economies  of  scale  that  have  been  critical  to  the  growth  of  mass  market  products  in  both 
wireless  (e.g.,  cellular  and  WiFi)  and  wireline  (e.g.,  cable  modems  and  DSL)  communication. 


Internet 


Figure  1.3:  The  Internet  has  a  core  of  routers  and  servers  connected  by  high-speed  fiber  links, 
with  wireless  networks  hanging  off  the  edge  (figure  courtesy  Aseem  Wadhwa). 


How  do  these  stories  come  together?  The  sketch  in  Figure  1.3  highlights  key  building  blocks 
of  the  Internet  today.  The  core  of  the  network  consists  of  powerful  routers  that  direct  packets 
of  data  from  an  incoming  edge  to  an  outgoing  edge,  and  servers  (often  housed  in  large  data 
centers)  that  serve  up  content  requested  by  clients  such  as  personal  computers  and  mobile  devices. 
The  elements  in  the  core  network  are  connected  by  high-speed  optical  fiber.  Wireless  can  be 
viewed  as  hanging  off  the  edge  of  the  Internet.  Wide  area  cellular  networks  may  have  worldwide 
coverage,  but  each  base  station  is  typically  connected  by  a  high-speed  link  to  the  wired  Internet. 
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WiFi  networks  are  wireless  local  area  networks,  typically  deployed  indoors  (but  potentially  also 
providing  outdoor  coverage  for  low-mobility  scenarios)  in  homes  and  office  buildings,  connected  to 
the  Internet  via  last  mile  links,  which  might  run  over  copper  wires  (a  legacy  of  wired  telephony, 
with  transceivers  typically  upgraded  to  support  broadband  Internet  access)  or  coaxial  cable 
(originally  deployed  to  deliver  cable  television,  but  now  also  providing  broadband  Internet  access). 
Some  areas  have  been  upgraded  to  optical  hber  to  the  curb  or  even  to  the  home,  while  some 
others  might  be  remote  enough  to  require  wireless  last  mile  solutions. 


Figure  1.4:  A  segment  of  a  cellular  network  with  idealized  hexagonal  shapes  (hgure  courtesy 
Aseem  Wadhwa). 


Zooming  in  now  on  cellular  networks.  Figure  1.4  shows  three  adjacent  cells  in  a  cellular  network 
with  hexagonal  cells.  A  working  dehnition  of  a  cell  is  that  it  is  the  area  around  a  base  station 
where  the  signal  strength  is  higher  than  that  from  other  base  stations.  Of  course,  under  realistic 
propagation  conditions,  cells  are  never  hexagonal,  but  the  concept  of  spatial  reuse  still  holds:  the 
interference  between  distant  cells  can  be  neglected,  hence  they  can  use  the  same  communication 
resources.  For  example,  in  Figure  1.4,  we  might  decide  to  use  three  different  frequency  bands 
in  the  three  cells  shown,  but  might  then  reuse  these  bands  in  other  cells.  Figure  1.4  also  shows 
that  a  user  may  be  simultaneously  in  range  of  multiple  base  stations  when  near  cell  boundaries. 
Crossing  these  boundaries  may  result  in  a  handoff  from  one  base  station  to  another.  In  addition, 
near  cell  boundaries,  a  mobile  device  may  be  in  communication  with  multiple  base  stations 
simultaneously,  a  concept  known  as  soft  handoff. 

It  is  useful  for  a  communication  system  designer  to  be  aware  of  the  preceding  “big  picture”  of 
technology  trends  and  network  architectures  in  order  to  understand  how  to  direct  his  or  her 
talents  as  these  systems  continue  to  evolve  (the  epilogue  contains  more  detailed  speculation 
regarding  this  evolution).  However,  the  hrst  order  of  business  is  to  acquire  the  fundamentals 
required  to  get  going  in  this  held.  These  are  quite  simply  stated:  a  communication  system 
designer  must  be  comfortable  with  mathematical  modeling  (in  order  to  understand  the  state  of 
the  art,  as  well  as  to  devise  new  models  as  required),  and  with  devising  and  evaluating  signal 
processing  algorithms  based  on  these  models.  The  goal  of  this  textbook  is  to  provide  a  hrst 
exposure  to  such  a  technical  background. 
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1.3  Scope  of  this  Textbook 


Referring  to  the  block  diagram  of  a  digital  communication  system  in  Figure  1.2,  our  focus  in 
this  textbook  is  to  provide  an  introduction  to  design  of  a  digital  communication  link  as  shown 
inside  the  dashed  box.  While  we  are  primarily  interested  in  digital  communication,  circuit  de¬ 
signers  implementing  such  systems  must  deal  with  analog  waveforms,  hence  we  believe  that  a 
rudimentary  background  in  analog  communication  techniques,  as  provided  in  this  book,  is  useful 
for  the  communication  system  designer.  We  do  not  discuss  source  encoding  and  decoding  in 
this  book;  these  topics  are  highly  specialized  and  technical,  and  doing  them  justice  requires  an 
entire  textbook  of  its  own  at  the  graduate  level.  A  detailed  outline  of  the  book  is  provided  in 
the  preface,  hence  we  restrict  ourselves  here  to  summarizing  the  roles  of  the  various  chapters: 
Chapter  2:  introduces  the  signal  processing  background  required  for  DSP-centric  implementa¬ 
tions  of  communication  transceivers; 

Chapter  3:  provides  just  enough  background  on  analog  communication  techniques  (can  be 
skipped  if  only  focused  on  digital  communication); 

Chapter  4:  discusses  digital  modulation  techniques; 

Chapter  5:  provides  the  probability  background  required  for  receiver  design,  including  noise 
modeling; 

Chapter  6:  discusses  design  and  performance  analysis  of  demodulators  in  digital  communication 
systems  for  idealized  link  models; 

Chapter  1:  provides  an  initial  exposure  to  channel  coding  techniques  and  benchmarks; 

Chapter  8:  provides  an  introduction  to  approaches  for  handling  channel  dispersion,  and  to  mul¬ 
tiple  antenna  communication; 

Epilogue:  discusses  emerging  trends  shaping  research  and  development  in  communications. 

Chapters  2,  4  and  6  are  core  material  that  must  be  mastered  (much  of  Chapter  5  is  also  core 
material,  but  some  readers  may  already  have  enough  probability  background  that  they  can  skip, 
or  skim,  it).  Chapter  3  is  highly  recommended  for  communication  system  designers  with  interest 
in  radio  frequency  circuit  design,  since  it  highlights,  at  a  high  level,  some  of  the  ideas  and  issues 
that  come  up  there.  Chapters  7  and  8  are  independent  of  each  other,  and  contain  more  advanced 
material  that  may  not  always  £t  within  an  undergraduate  curriculum.  They  contain  “hands-on” 
introductions  to  these  topics  via  code  fragments  and  software  labs  that  hopefully  encourage  the 
reader  to  explore  further. 


1.4  Concept  Inventory 

The  goal  of  this  chapter  is  to  provide  an  intellectual  framework  and  motivation  for  the  rest  of 
this  textbook.  Some  of  the  key  concepts  are  as  follows. 

•  Communication  refers  to  information  transfer  across  either  space  or  time,  where  the  latter 
refers  to  storage  media. 

•  Signals  carrying  information  and  signals  that  can  be  sent  over  a  communication  medium  are 
both  inherently  analog  (i.e.,  continuous-time,  continuous- valued). 

•  Analog  communication  corresponds  to  transforming  an  analog  message  waveform  directly  into 
an  analog  transmitted  waveform  at  the  transmitter,  and  undoing  this  transformation  at  the 
receiver. 

•  Digital  communication  corresponds  to  hrst  reducing  message  waveforms  to  information  bits, 
and  then  transporting  these  bits  over  the  communication  channel. 

•  Digital  communication  requires  the  following  steps:  source  encoding  and  decoding,  modulation 
and  demodulation,  channel  encoding  and  decoding. 

•  While  digital  communication  requires  more  processing  steps  than  analog  communication,  it  has 
the  advantages  of  optimality  and  scalability,  hence  there  is  an  unstoppable  trend  from  analog  to 
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digital. 

•  The  growth  in  communication  has  been  driven  by  major  technology  stories  including  the 
Internet,  wireless  and  Moore’s  law. 

•  Key  components  of  the  communication  system  designer’s  toolbox  are  mathematical  modeling 
and  signal  processing. 


1.5  Endnotes 

There  are  a  large  number  of  textbooks  on  communication  systems  at  both  the  undergraduate  and 
graduate  level.  Undergraduate  texts  include  Haykin  [1],  Proakis  and  Salehi  [2],  Pursley  [3],  and 
Ziemer  and  Tranter  [4].  Graduate  texts,  which  typically  focus  on  digital  communication  include 
Barry,  Lee  and  Messerschmitt  [5],  Benedetto  and  Biglieri  [6],  Madhow  [7],  and  Proakis  and  Salehi 
[8].  The  hrst  coherent  exposition  of  the  modern  theory  of  communication  receiver  design  is  in 
the  classical  (graduate  level)  textbook  by  Wozencraft  and  Jacobs  [9].  Other  important  classical 
graduate  level  texts  are  Viterbi  and  Omura  [10]  and  Blahut  [11].  More  specialized  references  (e.g., 
on  signal  processing,  information  theory,  channel  coding,  wireless  communication)  are  mentioned 
in  later  chapters.  In  addition  to  these  textbooks,  an  overview  of  many  important  topics  can  be 
found  in  the  recently  updated  mobile  communications  handbook  [12]  edited  by  Gibson. 

This  book  is  intended  to  be  accessible  to  readers  who  have  never  been  exposed  to  communication 
systems  before.  It  has  some  overlap  with  more  advanced  graduate  texts  (e.g..  Chapters  2,  4,  5 
and  6  here  overlap  heavily  with  Chapters  2  and  3  in  the  author’s  own  graduate  text  [7]),  and 
provides  the  technical  background  and  motivation  required  to  easily  access  these  more  advanced 
texts.  Of  course,  the  best  way  to  continue  building  expertise  in  the  held  is  by  actually  working 
in  it.  Research  and  development  in  this  held  requires  study  of  the  research  literature,  of  more 
specialized  texts  (e.g.,  on  information  theory,  channel  coding,  synchronization),  and  of  commer¬ 
cial  standards.  The  Institute  for  Electrical  and  Electronics  Engineers  (IEEE)  is  responsible  for 
publication  of  many  conference  proceedings  and  journals  in  communications:  major  conferences 
include  IEEE  Global  Telecommunications  Conference  (Globecom),  IEEE  International  Com¬ 
munications  Conference  (ICC),  major  journals  and  magazines  include  IEEE  Communications 
Magazine,  IEEE  Transactions  on  Communications,  IEEE  Journal  on  Selected  Areas  in  Commu¬ 
nications.  Closely  related  helds  such  as  information  theory  and  signal  processing  have  their  own 
conferences,  journals  and  magazines.  Major  conferences  include  the  IEEE  International  Sympo¬ 
sium  on  Information  Theory  (ISIT)  and  IEEE  International  Conference  on  Acoustics,  Speech  and 
Signal  Processing  (ICASSP),  journals  include  the  IEEE  Transactions  on  Information  Theory  and 
the  IEEE  Transactions  on  Signal  Processing.  The  IEEE  also  publishes  a  number  of  standards 
online,  such  as  the  IEEE  802  family  of  standards  for  local  area  networks. 

A  useful  resource  for  learning  source  coding  and  data  compression,  which  are  not  discussed  in 
this  text,  is  the  textbook  by  Sayood  [13].  Textbooks  on  core  concepts  in  communication  networks 
include  Bertsekas  and  Gallager  [14],  Kumar,  Manjunath  and  Kuri  [15],  and  Walrand  and  Varaiya 

|16]. 
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Chapter  2 

Signals  and  Systems 


A  communication  link  involves  several  stages  of  signal  manipulation;  the  transmitter  transforms 
the  message  into  a  signal  that  can  be  sent  over  a  communication  channel;  the  channel  distorts 
the  signal  and  adds  noise  to  it;  and  the  receiver  processes  the  noisy  received  signal  to  extract 
the  message.  Thus,  communication  systems  design  must  be  based  on  a  sound  understanding  of 
signals,  and  the  systems  that  shape  them.  In  this  chapter,  we  discuss  concepts  and  terminology 
from  signals  and  systems,  with  a  focus  on  how  we  plan  to  apply  them  in  our  discussion  of 
communication  systems.  Much  of  this  chapter  is  a  review  of  concepts  with  which  the  reader 
might  already  be  familiar  from  prior  exposure  to  signals  and  systems.  However,  special  attention 
should  be  paid  to  the  discussion  of  baseband  and  passband  signals  and  systems  (Sections  2.7 
and  2.8).  This  material,  which  is  crucial  for  our  purpose,  is  typically  not  emphasized  in  a  hrst 
course  on  signals  and  systems.  Additional  material  on  the  geometric  relationship  between  signals 
is  covered  in  later  chapters,  when  we  discuss  digital  communication. 

Chapter  Plan:  After  a  review  of  complex  numbers  and  complex  arithmetic  in  Section  2.1,  we 
provide  some  examples  of  useful  signals  in  Section  2.2.  We  then  discuss  LTI  systems  and  convolu¬ 
tion  in  Section  2.3.  This  is  followed  by  Fourier  series  (Section  2.4)  and  Fourier  transform  (Section 
2.5).  These  sections  (Sections  2.1  through  Section  2.5)  correspond  to  a  review  of  material  that 
is  part  of  the  assumed  background  for  the  core  content  of  this  textbook.  However,  even  readers 
familiar  with  the  material  are  encouraged  to  skim  through  it  quickly  in  order  to  gain  familiarity 
with  the  notation.  This  gets  us  to  the  point  where  we  can  classify  signals  and  systems  based 
on  the  frequency  band  they  occupy.  Specihcally,  we  discuss  baseband  and  passband  signals  and 
systems  in  Sections  2.7  and  2.8.  Messages  are  typically  baseband,  while  signals  sent  over  channels 
(especially  radio  channels)  are  typically  passband.  We  discuss  methods  for  going  from  baseband 
to  passband  and  back.  We  specihcally  emphasize  the  fact  that  a  real-valued  passband  signal  is 
equivalent  (in  a  mathematically  convenient  and  physically  meaningful  sense)  to  a  complex- valued 
baseband  signal,  called  the  complex  baseband  representation,  or  complex  envelope,  of  the  pass- 
band  signal.  We  note  that  the  information  carried  by  a  passband  signal  resides  in  its  complex 
envelope,  so  that  modulation  (or  the  process  of  encoding  messages  in  waveforms  that  can  be 
sent  over  physical  channels)  consists  of  mapping  information  into  a  complex  envelope,  and  then 
converting  this  complex  envelope  into  a  passband  signal.  We  discuss  the  physical  signihcance 
of  the  rectangular  form  of  the  complex  envelope,  which  corresponds  to  the  in-phase  (I)  and 
quadrature  (Q)  components  of  the  passband  signal,  and  that  of  the  polar  form  of  the  complex 
envelope,  which  corresponds  to  the  envelope  and  phase  of  the  passband  signal.  We  conclude  by 
discussing  the  role  of  complex  baseband  in  transceiver  implementations,  and  by  illustrating  its 
use  for  wireless  channel  modeling. 
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2.1 


Complex  Numbers 


Re(z) 


Figure  2.1:  A  complex  number  2;  represented  in  the  two-dimensional  real  plane. 


A  complex  number  can  be  written  as  z  =  x+jy,  where  x  and  y  are  real  numbers,  and  j  =  \/— 1. 
We  say  that  x  =  Re(2;)  is  the  real  part  of  2  and  y  =  Im(2;)  is  the  imaginary  part  of  2;.  As  depicted 
in  Figure  2.1,  it  is  often  advantageous  to  interpret  the  complex  number  z  as  a  two-dimensional 
real  vector,  which  can  be  represented  in  rectangular  form  as  {x,y)  =  (Re(z),  Im(.s)),  or  in  polar 
form  (r,  6)  as 

r  =  \z\  =  x'^  +  y'^ 

9  =  i^  =  tan“^  - 

We  can  go  back  from  polar  form  to  rectangular  form  as  follows: 

x  =  rcos6,  y  =  rsm6 

Complex  conjugation:  For  a  complex  number  z  =  x  +  jy  =  re^^,  its  complex  conjugate 

z*  =  X  —  jy  =  re~^^  (2.3) 


(2.1) 

(2.2) 


That  is, 

Re(2;*)  =  Re(z)  ,  Im(z*)  =  -Im(z)  .  . 

-Z2  ■ 

The  real  and  imaginary  parts  of  a  complex  number  can  be  written  in  terms  of  z  and  z*  as 
follows: 

Re(z)  =  ^  ,  Im(^)  =  ^  (2.5) 

Euler’s  formula:  This  formula  is  of  fundamental  importance  in  complex  analysis,  and  relates 
the  rectangular  and  polar  forms  of  a  complex  number: 

=  cos  9  +  j  sin  9 

The  complex  conjugate  of  is  given  by 

=  (e-^®)  *  =  cos  9  —  j  sin  9 
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(2.6) 


We  can  express  cosines  and  sines  in  terms  of  and  its  complex  conjngate  as  follows: 

pjs  I  p-je  je  _  -je 

Re  (e-^®)  = - =  COS0  ,  Im  ^ - =  sin0  (2.7) 

2  2j 

Applying  Euler’s  formula  to  (2.1),  we  can  write 

z  =  X  +  jy  =  r  cos  6  +  jr  sin  6  =  (2-8) 

Being  able  to  go  back  and  forth  between  the  rectangular  and  polar  forms  of  a  complex  number 
is  useful.  For  example,  it  is  easier  to  add  in  the  rectangular  form,  but  it  is  easier  to  multiply  in 
the  polar  form. 

Complex  Addition:  For  two  complex  numbers  zi  =  xi  +  jyi  and  Z2  =  X2  +  jy2, 

^1  +  ^2  =  {^1  +  ^2)  +  j  ivi  +  1/2)  (2-9) 

That  is, 

Re(2;i  +  Z2)  =  Re(2;i)  +  Re(2;2)  ,  Im(zi  +  Z2)  =  Im(2;i)  +  \m{z2)  (2-10) 

Complex  Multiplication  (rectangular  form):  For  two  complex  numbers  Zi  =  Xi  +  jyi  and 

2^2  =  X2+jy2, 

Z1Z2  =  ixiX2  -  yiy2)  +  j{yiX2  +  xiy2)  (2.11) 

This  follows  simply  by  multiplying  out,  and  setting  =  —1.  We  have 

'Re{ziZ2)  =  Re (zi) Re (2:2)  —  lm{zi)lm{z2)  ,  lm(ziZ2)  =  Im(zi)Re(;22)  +  Re(zi)Im(2;2)  (2.12) 

Note  that,  using  the  rectangular  form,  a  single  complex  multiplication  requires  four  real  multi¬ 
plications. 

Complex  Multiplication  (polar  form):  Complex  multiplication  is  easier  when  the  numbers 
are  expressed  in  polar  form.  For  zi  =  ^2  =  r2e^^^,  we  have 

^1^2  =  (2.13) 

That  is, 

\ziZ2\  =  \zi\\z2\  ,  /Z1Z2  =  /Zx  -b  IZ2  (2.14) 

Division:  For  two  complex  numbers  Zi  =  Xi  +  jyi  =  and  Z2  =  X2  +  jd/2  =  r2e^^^  (with 

Z2  7^  0,  i.e.,  r2  >  0),  it  is  easiest  to  express  the  result  of  division  in  polar  form; 

Z\lz2  =  (ri/r2)e-^4®i“®^^  (2.15) 

That  is, 

\Zilz2\  =  \Zi\l\z2\  /Zi/Z2  =  /Zi  -  /Z2  (2.16) 

In  order  to  divide  using  rectangular  form,  it  is  convenient  to  multiply  numerator  and  denominator 
by  z\^  which  gives 


Zilz2  =  ZIZ*2/{Z2ZI)  =  Zizl/\Z2\^ 
Multiplying  out  as  usual,  we  get 


{xi  +  3yi){.x2  -  iy2) 
xl  +  yl 


Zllz2 


{xiX2  +  ym)  +  j  {-xiy2  +  1/1X2) 

xl  +  yi 


(2.17) 
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Example  2.1.1  (Computations  with  complex  numbers)  Consider  the  complex  numbers 
Zi  =  l  +  j  and  Z2  =  Find  Zi  +  Z2,  Z1Z2,  and  Zij Z2-  Also  specify  z\, 

For  complex  addition,  it  is  convenient  to  express  both  numbers  in  rectangular  form.  Thus, 

Z2  =  2  (cos(— 7r/6)  +  j  sin(— 7r/6))  =  \/3  —  j 


and 

+  ^2  =  (1  +  j)  +  (V^  —  j)  =  y/3  +  1 

For  complex  multiplication  and  division,  it  is  convenient  to  express  both  numbers  in  polar  form. 
We  obtain  zi  =  \/2e-^’^A  py  applying  (2.1).  Now,  from  (2.11),  we  have 


^^^2  =  \/2e^'^/^2e"^'^/®  =  2\/2e^’(^A-V6)  ^  2\/2e^'^A2 


Similarly, 


Z1/Z2  = 


=  — ;=e- 


j(7r/4+7r/6)  _ 


i57r/12 


2e-J'V6  y/2'^ 

Multiplication  using  the  rectangular  forms  of  the  complex  numbers  yields  the  following: 

2;iZ2  =  (1  +  j)(\/3  -  j)  =  VS-  j  +  VSj  +  1  =  (^\/3  +  1  j  +j  (^\/3  -  1  j 

Note  that  zl  =  1  —  j  =  and  =  2e^'^l^  =  \/3  +  j.  Division  using  rectangular  forms 

gives 

Z\l Z2  =  ^1^2/I^2p  =  (1  +  j)(\/3  +  j)/2^  =  - - - h  j - - - 


No  need  to  memorize  trigonometric  identities  any  more:  Once  we  can  do  computations 
using  complex  numbers,  we  can  use  Euler’s  formula  to  quickly  derive  well-known  trigonometric 
identities  involving  sines  and  cosines.  For  example. 


cos(6*i  -I-  62)  =  Re 


But 


gjl^i+es)  ^  ^  ('gQg  _|_  j  ("^Qg  _|_  j  gjj^ 

=  (cos  61  cos  62  —  sin  61  sin  62)  +  j  (cos  61  sin  62  +  sin  61  cos  62) 


Taking  the  real  part,  we  can  read  off  the  identity 


cos (6*1  +  62)  =  cos  61  cos  62  —  sin  9i  sin  62  (2-18) 

Moreover,  taking  the  imaginary  part,  we  can  read  off 

sin(6*i  -|-  6*2)  =  cos  9i  sin  02  -|-  sin  9i  cos  02  (2T9) 


2.2  Signals 

Signal:  A  signal  s{t)  is  a  function  of  time  (or  some  other  independent  variable,  such  as  fre¬ 
quency,  or  spatial  coordinates)  which  has  an  interesting  physical  interpretation.  For  example,  it 
is  generated  by  a  transmitter,  or  processed  by  a  receiver.  While  physically  realizable  signals  such 
as  those  sent  over  a  wire  or  over  the  air  must  take  real  values,  we  shall  see  that  it  is  extremely 
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useful  (and  physically  meaningful)  to  consider  a  pair  of  real-valued  signals,  interpreted  as  the 
real  and  imaginary  parts  of  a  complex-valued  signal.  Thus,  in  general,  we  allow  signals  to  take 
complex  values. 

Discrete  versus  Continuous  Time:  We  generically  use  the  notation  x{t)  to  denote  continuous 
time  signals  {t  taking  real  values),  and  x[n]  to  denote  discrete  time  signals  {n  taking  integer 
values).  A  continuous  time  signal  x{t)  sampled  at  rate  produces  discrete  time  samples  x{nTs  + 
to)  (to  an  arbitrary  offset),  which  we  often  denote  as  a  discrete  time  signal  x[n].  While  signals 
sent  over  a  physical  communication  channel  are  inherently  continuous  time,  implementations  at 
both  the  transmitter  and  receiver  make  heavy  use  of  discrete  time  implementations  on  digitized 
samples  corresponding  to  the  analog  continuous  time  waveforms  of  interest. 

We  now  introduce  some  signals  that  recur  often  in  this  text. 

Sinusoid:  This  is  a  periodic  function  of  time  of  the  form 

s{t)  =  A  cos(27r/of  -|-  0)  (2.20) 

where  A  >  0  is  the  amplitude,  /o  is  the  frequency,  and  0  G  [0,  27r]  is  the  phase.  By  setting  0  =  0, 
we  obtain  a  pure  cosine  Acos27r/ct,  and  by  setting  0  =  —  we  obtain  a  pure  sine  Asin27r/ct. 
In  general,  using  (2.18),  we  can  rewrite  (2.20)  as 

s(t)  =  Ac  cos  27r/ot  —  Ag  sin  27r/ot  (2-21) 

where  Ac  =  Acos0  and  Ag  =  Asin6'  are  real  numbers.  Using  Euler’s  formula,  we  can  write 

Ae^^  =  Ac  +  jAg  (2.22) 

Thus,  the  parameters  of  a  sinusoid  at  frequency  /o  can  be  represented  by  the  complex  number  in 
(2.22),  with  (2.20)  using  the  polar  form,  and  (2.21)  the  rectangular  form,  of  this  number.  Note 
that  A  =  -I-  and  0  =  tan“^ 

Clearly,  sinusoids  with  known  amplitude,  phase  and  frequency  are  perfectly  predictable,  and 
hence  cannot  carry  any  information.  As  we  shall  see,  information  can  be  transmitted  by  making 
the  complex  number  Ae^^  =  Ac  +  jAg  associated  with  the  parameters  of  sinusoid  vary  in  a  way 
that  depends  on  the  message  to  be  conveyed.  Of  course,  once  this  is  done,  the  resulting  signal 
will  no  longer  be  a  pure  sinusoid,  and  part  of  the  work  of  the  communication  system  designer  is 
to  decide  what  shape  such  a  signal  should  take  in  the  frequency  domain. 

We  now  dehne  complex  exponentials,  which  play  a  key  role  in  understanding  signals  and  systems 
in  the  frequency  domain. 

Complex  exponential:  A  complex  exponential  at  a  frequency  /o  is  dehned  as 

s{t)  =  yle^(2-/oi+e)  =  (2.23) 

where  A  >  0  is  the  amplitude,  /o  is  the  frequency,  0  G  [0,  27r]  is  the  phase,  and  a  =  Ae^^  is  a 
complex  number  that  contains  both  the  amplitude  and  phase  information.  Let  us  now  make 
three  observations.  First,  note  the  ease  with  which  we  handle  amplitude  and  phase  for  complex 
exponentials:  they  simply  combine  into  a  complex  number  that  factors  out  of  the  complex 
exponential.  Second,  by  Euler’s  formula. 

Re  =  Acos(27r/of  +  0) 

so  that  real-valued  sinusoids  are  “contained  in”  complex  exponentials.  Third,  as  we  shall  soon 
see,  the  set  of  complex  exponentials  where  /  takes  values  in  (— cxo,cxo),  form  a  “basis” 

for  a  large  class  of  signals  (basically,  for  all  signals  that  are  of  interest  to  us),  and  the  Fourier 
transform  of  a  signal  is  simply  its  expansion  with  respect  to  this  basis.  Such  observations  are 
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a  — )■  0  in  the 


1 

/a 

-a/2 

a/2 

Figure  2.2:  The  impulse  function  may  be  viewed  as  a  limit  of  tall  thin  pulses  ( 
examples  shown  in  the  hgure). 


Unit  area 


Figure  2.3:  Multiplying  a  signal  with  a  tall  thin  pulse  to  select  its  value  at  to- 


key  to  why  complex  exponentials  play  such  an  important  role  in  signals  and  systems  in  general, 
and  in  communication  systems  in  particular. 

The  Delta,  or  Impulse,  Function:  Another  signal  that  plays  a  crucial  role  in  signals  and  sys¬ 
tems  is  the  delta  function,  or  the  unit  impulse,  which  we  denote  by  S{t).  Physically,  we  can  think 
of  it  as  a  narrow,  tall  pulse  with  unit  area:  examples  are  shown  in  Figure  2.2.  Mathematically, 
we  can  think  of  it  as  a  limit  of  such  pulses  as  the  pulse  width  shrinks  (and  hence  the  pulse  height 
goes  to  inhnity).  Such  a  limit  is  not  physically  realizable,  but  it  serves  a  very  useful  purpose  in 
terms  of  understanding  the  structure  of  physically  realizable  signals.  That  is,  consider  a  signal 
s{t)  that  varies  smoothly,  and  multiply  it  with  a  tall,  thin  pulse  of  unit  area,  centered  at  time 
to,  as  shown  in  Figure  2.3.  If  we  now  integrate  the  product,  we  obtain 

s{t)p{t)dt  =  /  s{t)p{t)dt  ~  s{to)  /  p{t)dt  =  s{to) 

■oo  JtQ—ai  JtQ—ai 

That  is,  the  preceding  operation  “selects”  the  value  of  the  signal  at  time  to.  Taking  the  limit  of 
the  tall  thin  pulse  as  its  width  oi  -|-  02  ^  0,  we  get  a  translated  version  of  the  delta  function, 
namely,  6{t  —  to).  Note  that  the  exact  shape  of  the  pulse  does  not  matter  in  the  preceding 
argument.  The  delta  function  is  therefore  defined  by  means  of  the  following  sifting  property:  for 
any  “smooth”  function  s{t),  we  have 


s(t)6(t  —  to)dt 


s{to)  Sifting  property  of  the  impulse 


(2.24) 


Thus,  the  delta  function  is  dehned  mathematically  by  the  way  it  acts  on  other  signals,  rather 
than  as  a  signal  by  itself.  However,  it  is  also  important  to  keep  in  mind  its  intuitive  interpretation 
as  (the  limit  of)  a  tall,  thin,  pulse  of  unit  area. 
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The  following  function  is  useful  for  expressing  signals  compactly. 

Indicator  function:  We  use  I  a  to  denote  the  indicator  function  of  a  set  A,  dehned  as 

1,  x&A 

0,  otherwise 

The  indicator  function  of  an  interval  is  a  rectangular  pulse,  as  shown  in  Figure  2.4. 


Ia{x)  = 


I  .dx) 

[a,b] 


a  b 


Figure  2.4:  The  indicator  function  of  an  interval  is  a  rectangular  pulse. 


Figure  2.5:  The  functions  u{t)  =  2(1  —  and  v{t)  =  3/[_i,o](^)  +  -^[o,i](^)  —  ^[i,2]it)  can 

be  written  compactly  in  terms  of  indicator  functions. 


The  indicator  function  can  also  be  used  to  compactly  express  more  complex  signals,  as  shown  in 
the  examples  in  Figure  2.5. 

Sine  function:  The  sine  function,  plotted  in  Figure  2.6,  is  dehned  as 

sin(7ra; 

71X 

where  the  value  at  x  =  0  is  dehned  as  the  limit  as  a;  — )■  0  to  be  sinc(O)  =  1.  Since  |  sin(7ra;)|  <  1, 
we  have  that  |sinc(a:)|  <  with  equality  if  and  only  if  x  is  an  odd  multiple  of  1/2.  That  is, 

the  sine  function  exhibits  a  sinusoidal  variation,  with  an  envelope  that  decays  as  j^. 

The  analogy  between  signals  and  vectors:  Even  though  signals  can  be  complicated  functions 
of  time  that  live  in  an  inhnite-dimensional  space,  the  mathematics  for  manipulating  them  are 
very  similar  to  those  for  manipulating  hnite-dimensional  vectors,  with  sums  replaced  by  integrals. 
A  key  building  block  of  communication  theory  is  the  relative  geometry  of  the  signals  used,  which 
is  governed  by  the  inner  products  between  signals.  Inner  products  for  continuous-time  signals  can 
be  dehned  in  a  manner  exactly  analogous  to  the  corresponding  dehnitions  in  hnite-dimensional 
vector  space. 

Inner  Product:  The  inner  product  for  two  m  x  1  complex  vectors  s  =  (s[l], ...,  s[m])^  and 
r  =  (r[l], ...,  r[m])^  is  given  by 

m 

(s,  r)  =  ^  s[i]r*[f]  =  r'^s  (2.25) 

i=l 
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Figure  2.6:  The  sine  function. 


Similarly,  we  define  the  inner  product  of  two  (possibly  comp  lex- valued)  signals  s(t)  and  r(t)  as 
follows: 


(s,r)  =  /  s{t)r*{t)  dt 


(2.26) 


J  —OO 

The  inner  product  obeys  the  following  linearity  properties: 


{aiSi  +  0252,  r)  =  ai(si,  r)  02(^2,  r) 

(s,airi  a2r2)  =  0^(5,  n)  al{s,r2) 

where  Oi,  02  are  complex-valued  constants,  and  s,  Si,  S2,  ''"1,  ^2  are  signals  (or  vectors).  The 

complex  conjugation  when  we  pull  out  constants  from  the  second  argument  of  the  inner  product 
is  something  that  we  need  to  maintain  awareness  of  when  computing  inner  products  for  complex¬ 
valued  signals. 

Energy  and  Norm:  The  energy  Eg  of  a  signal  s  is  dehned  as  its  inner  product  with  itself: 

/OO 

■00 

where  ||s||  denotes  the  norm  of  s.  If  the  energy  of  s  is  zero,  then  s  must  be  zero  “almost 
everywhere”  (e.g.,  s{t)  cannot  be  nonzero  over  any  interval,  no  matter  how  small  its  length). 
For  continuous-time  signals,  we  take  this  to  be  equivalent  to  being  zero  everywhere.  With  this 
understanding,  ||s||  =  0  implies  that  s  is  zero,  which  is  a  property  that  is  true  for  norms  in 
hnite-dimensional  vector  spaces. 


(2.27) 


Example  2.2.1  (Energy  computations)  Consider  s{t)  =  2/[o^r]  +  jI[T/2,2T]- 
more  detail,  we  have 


(2,  0  <  t  <  T/2 

s(t)  =  <  2  +  j,  T/2<t<T 
I  j,  T<t<2T 


Writing  it  out  in 


so  that  its  energy  is  given  by 


2‘^dt  + 


2  -I-  jl'^dt 


\j\^dt  =  4(T/2)  +  5(T/2)  +  T  =  llT/2 


34 


As  another  example,  consider  s{t)  =  e  3|t|+j27rt^  which  the  energy  is  given  by 

pCXD  /*CXD 

|e-3l'l+^'2^*|2dt=  /  e-®l*ldt  =  2  /  e-^^dt=l/3 


|s|P  = 


J —oo  J —oo  Jo 

Note  that  the  complex  phase  term  j27rt  does  not  affect  the  energy,  since  it  goes  away  when  we 
take  the  magnitude. 


Power:  The  power  of  a  signal  s{t)  is  defined  as  the  time  average  of  its  energy  computed  over  a 
large  time  interval: 

1 

Ps=  lim  —  /  \s{t)\‘^dt  (2.28) 

To — ±Q  J _ Tq 

Finite  energy  signals,  of  course,  have  zero  power. 

We  see  from  (2.28)  that  power  is  defined  as  a  time  average.  It  is  useful  to  introduce  a  compact 
notation  for  time  averages. 

Time  average:  For  a  function  g(t),  define  the  time  average  as 


9  =  lim  —  /  g(t)dt 

To^OO  J  „  /  To 


(2.29) 


That  is,  we  compute  the  time  average  over  an  observation  interval  of  length  To,  and  then  let 
the  observation  interval  get  large.  We  can  now  rewrite  the  power  computation  in  (2.28)  in  this 
notation  as  follows. 

Power:  The  power  of  a  signal  s(t)  is  defined  as 


(2.30) 


Another  time  average  of  interest  is  the  DC  value  of  a  signal. 

DC  value:  The  DC  value  of  s(t)  is  defined  as  s(t). 

Let  us  compute  these  quantities  for  the  simple  example  of  a  complex  exponential,  s(t)  = 
^gi(27r/ot+e)^  where  A  >  0  is  the  amplitude,  6*  G  [0, 27r]  is  the  phase,  and  /o  is  a  real- valued 
frequency.  Since  |s(f)p  =  for  all  t,  we  get  the  same  value  when  we  average  it.  Thus,  the 
power  is  given  by  Pg  =  s‘^{t)  =  A^.  For  nonzero  frequency  /o,  it  is  intuitively  clear  that  all  the 
power  in  s  is  concentrated  away  from  DC,  since  s(t)  =  f-)-  S(f)  =  AeP5(f  —  /o). 

We  therefore  see  that  the  DC  value  is  zero.  While  this  is  a  convincing  intuitive  argument,  it  is 
instructive  to  prove  this  starting  from  the  definition  (2.29). 

Proving  that  a  complex  exponential  has  zero  DC  value:  For  s(t)  =  the 

integral  over  its  period  (of  length  l//o)  is  zero.  As  shown  in  Figure  2.7,  the  length  L  of  any 
interval  /  can  be  written  as  L  =  K/ Jq  +  i  where  iC  is  a  nonnegative  integer  and  0  <  £  <  ^  is 
the  length  of  the  remaining  interval  A.  Since  the  integral  over  an  integer  number  of  periods  is 
zero,  we  have 

/  s(t)dt  =  /  s(t)dt 


Thus, 


s(t)dt\  =  I  f  s(t)dt\  <  i  maxt|s(t)|  =  Ai  <  — 

Jlr-  JO 
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Interval  (length  / ) 


¥Jir 


1/fr 


-  Interval  I - »- 

Figure  2.7:  The  interval  I  for  computing  the  time  average  of  a  periodic  function  with  period 
l//o  can  be  decomposed  into  an  integer  number  K  of  periods,  with  the  remaining  interval  Ir  of 
length  (-  < 


since  \s(t)  \  =  A.  We  therefore  obtain 


To 

s{t)dt\  <  {A/fo) 


which  yields  that  the  DC  value  s  =  0,  since 


s 


lim 

To^oo 


1 

O 


s{t)dt\  <  lim 


A 


To^oo  foTo 


0 


Essentially  the  same  argument  implies  that,  in  general,  the  time  average  of  a  periodic  signal 
equals  the  average  over  a  single  period.  We  use  this  fact  without  further  comment  henceforth. 

Power  and  DC  value  of  a  sinusoid:  For  a  real-valued  sinusoid  s{t)  =  Acos{27rfot  +  6),  we 
can  use  the  results  derived  for  complex  exponentials  above.  Using  Euler’s  identity,  a  real-valued 
sinusoid  at  /o  is  a  sum  of  complex  exponentials  at  ±/o: 


s{t) 


2 


A 


■j{2-Kfot+e) 


Since  each  complex  exponential  has  zero  DC  value,  we  obtain 


s  =  0 


That  is,  the  DC  value  of  any  real- valued  sinusoid  is  zero. 


_  _  A2  A2  A2 

Ps  =  s'^it)  =  A^  cos2(27r/of  +  ^  ^  cos(47r/of  -h  26^)  =  — 

since  the  DC  value  of  the  sinusoid  at  2/o  is  zero. 


2.3  Linear  Time  Invariant  Systems 

System:  A  system  takes  as  input  one  or  more  signals,  and  produces  as  output  one  or  more 
signals.  A  system  is  specihed  once  we  characterize  its  input-output  relationship;  that  is,  if  we 
can  determine  the  output,  or  response,  y(t),  corresponding  to  any  possible  input  x(t)  in  a  given 
class  of  signals  of  interest. 
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Our  primary  focus  here  is  on  linear  time  invariant  (LTI)  systems,  which  provide  good  models 
for  hlters  at  the  transmitter  and  receiver,  as  well  as  for  the  distortion  induced  by  a  variety  of 
channels.  We  shall  see  that  the  input-output  relationship  is  particularly  easy  to  characterize  for 
such  systems. 

Linear  system:  Let  Xi{t)  and  0:2 (t)  denote  arbitrary  input  signals,  and  let  yiit)  and  2/2 (^) 
denote  the  corresponding  system  outputs,  respectively.  Then,  for  arbitrary  scalars  oi  and  02,  the 
response  of  the  system  to  input  aiXi(t)  -|-  02X2 (t)  is  aiyi(t)  -|-  021/2 (t)- 

Time  invariant  system:  Let  y{t)  denote  the  system  response  to  an  input  x{t).  Then  the 
system  response  to  a  time-shifted  version  of  the  input,  Xi(t)  =  x(f  —  to)  is  t/i(t)  =  yit  —  to).  That 
is,  a  time  shift  in  the  input  causes  an  identical  time  shift  in  the  output. 


Example  2.3.1  Examples  of  linear  systems  It  can  (and  should)  be  checked  that  the  following 
systems  are  linear.  These  examples  show  that  linear  systems  may  or  may  not  be  time  invariant. 

y{t)  =  2x(t  —  1)  —  jx(t  —  2)  time  invariant 

y(t)  =  (3  —  2j)x(l  —  t)  time  varying 
y{t)  =  x{t)  cos(1007rt)  —  x{t  —  1)  sin(1007rt)  time  varying 

y{t)  =  /  x{T)dT  time  invariant 

Jt-i 

Example  2.3.2  Examples  of  time  invariant  systems  It  can  (and  should)  be  checked  that 
the  following  systems  are  time  invariant.  These  examples  show  that  time  invariant  systems  may 
or  may  not  be  linear. 

y{t)  =  nonlinear 

y{t)  =  /  x(r)e“^*“'^^dr  linear 

J  —00 

y{t)  =  /  x^{T)dT  nonlinear 
Jt-i 

Linear  time  invariant  system:  A  linear  time  invariant  (LTI)  system  is  (unsurprisingly)  dehned 
to  be  a  system  which  is  both  linear  and  time  invariant.  What  is  surprising,  however,  is  how 
powerful  the  LTI  property  is  in  terms  of  dictating  what  the  input-output  relationship  must  look 
like.  Specifically,  if  we  know  the  impulse  response  of  an  LTI  system  (i.e.,  the  output  signal 
when  the  input  signal  is  the  delta  function),  then  we  can  compute  the  system  response  for  any 
input  signal.  Before  deriving  and  stating  this  result,  we  illustrate  the  LTI  property  using  an 
example;  see  Figure  2.8.  Suppose  that  the  response  of  an  LTI  system  to  the  rectangular  pulse 
Pi{t)  =  /[_i  i](t)  is  given  by  the  trapezoidal  waveform  hi{t).  We  can  now  compute  the  system 

response  to  any  linear  combination  of  time  shifts  of  the  pulse  p{t),  as  illustrated  by  the  example 
in  the  hgure.  More  generally,  using  the  LTI  property,  we  infer  that  the  response  to  an  input 
signal  of  the  form  x{t)  =  aiPi{t  -  U)  is  y{t)  =  aihit  -  U). 

Can  we  extend  the  preceding  idea  to  compute  the  system  response  to  arbitrary  input  signals? 

The  answer  is  yes:  if  we  know  the  system  response  to  thinner  and  thinner  pulses,  then  we 
can  approximate  arbitrary  signals  better  and  better  using  linear  combinations  of  shifts  of  these 
pulses.  Consider  p/\{t)  =  aj(/:),  where  A  >  0  is  getting  smaller  and  smaller.  Note  that  we 

have  normalized  the  area  of  the  pulse  to  unity,  so  that  the  limit  of  pA{t)  as  A  — )■  0  is  the  delta 
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Figure  2.8:  Given  that  the  response  of  an  LTI  system  S  to  the  pulse  pi(t)  is  hi(t),  we  can  use  the 
LTI  property  to  infer  that  the  response  to  x{t)  =  2pi{t)  —  pi(t  —  1)  is  y{t)  =  2hi{t)  —  hi(t  —  1). 


Figure  2.9:  A  smooth  signal  can  be  approximated  as  a  linear  combination  of  shifts  of  tall  thin 
pulses. 
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function.  Figure  2.9  shows  how  to  approximate  a  smooth  input  signal  as  a  linear  combination  of 
shifts  of  PA{t)-  That  is,  for  A  small,  we  have 


CX) 

x{t)  ^  XA{t)  =  x{kA)Ap^{t  —  kA)  (2-31) 

k=—oo 


If  the  system  response  to  pAit)  is  hA{t),  then  we  can  use  the  LTI  property  to  compute  the 
response  |/a(^)  to  XA(t),  and  use  this  to  approximate  the  response  y(t)  to  the  input  x(t),  as 
follows: 

CX) 

y{t)  ^  yA(t)  =  x{kA)AhA(t  —  kA)  (2.32) 

k  =  —  OQ 


As  A  — )■  0,  the  sums  above  tend  to  integrals,  and  the  pulse  pA(t)  tends  to  the  delta  function  6(t). 
The  approximation  to  the  input  signal  in  equation  (2.31)  becomes  exact,  with  the  sum  tending 
to  an  integral: 

poo 


lim  XaU)  =  x(t) 
A^o 


x{T)5{t 


T)dT 


replacing  the  discrete  time  shifts  kA  by  the  continuous  variable  r,  the  discrete  increment  A  by 
the  inhnitesimal  dr,  and  the  sum  by  an  integral.  This  is  just  a  restatement  of  the  sifting  property 
of  the  impulse.  That  is,  an  arbitrary  input  signal  can  be  expressed  as  a  linear  combination  of 
time-shifted  versions  of  the  delta  function,  where  we  now  consider  a  continuum  of  time  shifts. 

In  similar  fashion,  the  approximation  to  the  output  signal  in  (2.32)  becomes  exact,  with  the  sum 
reducing  to  the  following  convolution  integral: 


lim  yA{t)  =  y{t) 
A^O 


x{T)h{t 


T)dT 


(2.33) 


where  h{t)  denotes  the  impulse  response  of  the  LTI  system. 

Convolution  and  its  computation:  The  convolntion  v{t)  of  two  signals  ui{t)  and  U2{t)  is 
given  by 

/CX)  roo 

Ui{T)u2{t  —  t)  dr  =  /  Ui{t  —  t)u2{t)  dr  (2.34) 

•CX)  J  — CX) 

Note  that  r  is  a  dummy  variable  that  is  integrated  out  in  order  to  determine  the  value  of  the 
signal  v{t)  at  each  possible  time  t.  The  role  of  ui  and  U2  in  the  integral  can  be  exchanged.  This 
can  be  proved  using  a  change  of  variables,  replacing  f  —  r  by  r.  We  often  drop  the  time  variable, 
and  write  v  =  ui  *  U2  =  U2  *  ui. 

An  LTI  system  is  completely  characterized  by  its  impulse  response:  As  derived  in 
(2.33),  the  output  y  of  an  LTI  system  is  the  convolution  of  the  input  signal  u  and  the  system 
impulse  response  h.  That  is,  y  =  u*h.  From  (2.34),  we  realize  that  the  role  of  the  signal  and  the 
system  can  be  exchanged:  that  is,  we  would  get  the  same  output  y  if  a  signal  h  is  sent  through 
a  system  with  impulse  response  u. 

Flip  and  slide:  Consider  the  expression  for  the  convolution  in  (2.34): 


v{t) 


Ui{T)u2{t 


t)  dr 


Fix  a  value  of  time  t  at  which  we  wish  to  evaluate  v.  In  order  to  compute  v{t),  we  must  multiply 
two  functions  of  a  “dummy  variable”  r  and  then  integrate  over  r.  In  particular,  S2{t)  =  U2{—t) 
is  the  signal  U2{t)  flipped  around  the  origin,  so  that  U2{t  —  t)  =  U2{—{t  —  t))  =  S2(r  —  t)  is 
S2(r)  translated  to  the  right  by  t  (if  f  <  0,  translation  to  the  right  by  t  actually  corresponds  to 
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translation  is  to  the  left  by  |t|).  In  short,  the  mechanics  of  computing  the  convolution  involves 
flipping  and  sliding  one  of  the  signals,  multiplying  with  the  other  signal,  and  integrating.  Pictures 
are  extremely  helpful  when  doing  such  computations  by  hand,  as  illustrated  by  the  following 
example. 


Ui(T  ) 


) 

t-3  t-1 

) 


t-3  t-1 


t- 

-3 

-1 

t 

-3 

-1 

X 


X 


X 


X 


X 


t-3  t-1 


Ujlx) 


_ 

1  3 


(a)  t-1  <  5 


(b)  t-3  <  5,  t-1  >  5 


(c)  t-3  >5,  t-1  <  11 


Flip  1 


- ■ 

-3  -1 

(d)t-3<  11,  t-1  >  11 


(e)t-3>  11 


Figure  2.10:  Illustrating  the  flip  and  slide  operation  for  the  convolution  of  two  rectangular  pulses. 


v(t) 

-2 


t 


Figure  2.11;  The  convolution  of  the  two  rectangular  pulses  in  Example  2.3.3  results  in  a  trape¬ 
zoidal  pulse. 


Example  2.3.3  Convolving  rectangular  pulses:  Consider  the  rectangular  pulses  Ui{t)  = 
and  U2(t)  =  /[i^3](t).  We  wish  to  compute  the  convolution 


v{t)  =  (mi  *  U2){t) 


Ui{T)u2(t 


T)dT 


We  now  draw  pictures  of  the  signals  involved  in  these  “flip  and  slide”  computations  in  order  to 
hgure  out  the  limits  of  integration  for  different  ranges  of  t.  Figure  2.10  shows  that  there  are  hve 
different  ranges  of  interest,  and  yields  the  following  result: 

(a)  For  t  <  6,  Ui{T)u2{t  —  r)  =  0,  so  that  v{t)  =  0. 

(b)  For  6  <  f  <  8,  ui{T)u2{t  —  r)  =  1  for  5  <  r  <  f  —  1,  so  that 

v{t)  =  j  dr  =  t  —  6 
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(c)  For  8  <  t  <  12,  Mi(r)M2(t  —  r)  =  1  for  t  —  3  <  r  <  t  —  1,  so  that 

v{t)  =  f  dr  =  2 

Jt-3 

(d)  For  12  <  t  <  14,  ui{T)u2{t  —  r)  =  1  for  t  —  3  <  r  <  11,  so  that 

v(t)  =  f  dr  =  11  —  (t  —  3)  =  14  —  t 
Jt-3 

(e)  For  t  >  14,  ui{r)u2{t  —  r)  =  0,  so  that  v{t)  =  0. 

The  result  of  the  convolution  is  the  trapezoidal  pulse  sketched  in  Figure  2.11. 
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Figure  2.12:  Convolution  of  two  rectangular  pulses  as  a  function  of  pulse  durations.  The  trape¬ 
zoidal  pulse  reduces  to  a  triangular  pulse  for  equal  pulse  durations. 


It  is  useful  to  record  the  general  form  of  the  convolution  between  two  rectangular  pulses  of  the 
form  /[-a/2, a/2] (/)  ciud  I[-b/2,b/2](t),  where  we  take  a  <b  without  loss  of  generality.  The  result  is 
a  trapezoidal  pulse,  which  reduces  to  a  triangular  pulse  for  a  =  6,  as  shown  in  Figure  2.12.  Once 
we  know  this,  using  the  LTI  property,  we  can  infer  the  convolution  of  any  signals  which  can  be 
expressed  as  a  linear  combination  of  shifts  of  rectangular  pulses. 

Occasional  notational  sloppiness  can  be  useful:  As  the  preceding  example  shows,  a  con¬ 
volution  computation  as  in  (2.34)  requires  a  careful  distinction  between  the  variable  t  at  which 
the  convolution  is  being  evaluated,  and  the  dummy  variable  r.  This  is  why  we  make  sure  that 
the  dummy  variable  does  not  appear  in  our  notation  (s  *  for  the  convolution  between  sig¬ 
nals  s{t)  and  r{t).  However,  it  is  sometimes  convenient  to  abuse  notation  and  use  the  notation 
s{t)  *  r{t)  instead,  as  long  we  remain  aware  of  what  we  are  doing.  For  example,  this  enables  us 
to  compactly  state  the  following  linear  time  invariance  (LTI)  property: 

{aiSi{t  -  ti)  -h  a2S2{t  -  t2))  *  r(t)  =  ai(si  *r){t  -  ti)  -h  02(82  *r){t  -  ^2) 
for  any  complex  gains  Oi  and  02,  and  any  time  offsets  ti  and  /2- 

Example  2.3.4  (Modeling  a  multipath  channel)  We  can  get  a  delayed  version  of  a  signal 
by  convolving  it  with  a  delayed  impulse  as  follows: 

yi{t)  =  u{t)  *  6{t  —  ti)  =  u{t  —  ti)  (2.35) 
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To  see  this,  compute 


yi{t)  =  j  u{T)6{t  —  T  —  ti)dT  =  j  u{t)5{t  —  (t  —  ti))dT  =  u{t  —  ti) 
where  we  hrst  use  the  fact  that  the  delta  function  is  even,  and  then  use  its  sifting  property. 


Figure  2.13:  Multipath  channels  typical  of  wireless  communication  can  include  line  of  sight  (LOS) 
and  reflected  paths. 


Equation  (2.35)  immediately  tells  us  how  to  model  multipath  channels,  in  which  multiple  scat¬ 
tered  versions  of  a  transmitted  signal  u{t)  combine  to  give  a  received  signal  y{t)  which  is  a 
superposition  of  delayed  versions  of  the  transmitted  signal,  as  illustrated  in  Figure  2.13: 

y{t)  =  aiu(t  -  Ti)  -f-  ...  amU(t  -  Tm) 

(plus  noise,  which  we  have  not  talked  about  yet).  From  (2.35),  we  see  that  we  can  write 

y{t)  =  aiu{t)  *  5(t  -  Ti)  ...  -1-  amu{t)  *  d{t  -  Tm)  =  u{t)  *  (Q;i5(t  -  n)  -h  ...  amd{t  -  Tm)) 

That  is,  we  can  model  the  received  signal  as  a  convolution  of  the  transmitted  signal  with  a 
channel  impulse  response  which  is  a  linear  combination  of  time-shifted  impulses: 

h{t)  =  ai5{t  -  Ti)  +  ...  -F  am5{t  -  Tm)  (2.36) 

Figure  2.14  illustrates  how  a  rectangular  pulse  spreads  as  it  goes  through  a  multipath  channel 
with  impulse  response  h(t)  =  6{t  —  1)  —  0.5(5(t  —  1.5)  -|-  0.5(5(t  —  3.5).  While  the  gains  {a^}  in  this 
example  are  real- valued,  as  we  shall  soon  see  (in  Section  2.8),  we  need  to  allow  both  the  signal 
u{t)  and  the  gains  {c^k}  to  take  complex  values  in  order  to  model,  for  example,  signals  carrying 
information  over  radio  channels. 


U(|) 
1  - 


Rectangular  pulse 


h(t) , 

1 

vlultipath  Channel 

0.5 

1-5  f  .  . 

1 

0.5 

- 

y(t)  =  (u*h)  (t) 

-0.5 

1  3  3.5  5.5  "  ‘ 

1  3.5  '  ‘ 

-0.5 

Channel  output 

Figure  2.14:  A  rectangular  pulse  through  a  multipath  channel. 
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LTI  System 


Figure  2.15:  Complex  exponentials  are  eigenfunctions  of  LTI  systems. 


Complex  exponential  through  an  LTI  system:  In  order  to  understand  LTI  systems  in  the 
frequency  domain,  let  us  consider  what  happens  to  a  complex  exponential  u{t)  =  when  it 

goes  through  an  LTI  system  with  impulse  response  h{t).  The  output  is  given  by 


y{t)  =  {u*  h){t)  = 


(2.37) 


where 

/CX) 

■OO 

is  the  Fourier  transform  of  h  evaluated  at  the  frequency  /q.  We  discuss  the  Fourier  transform 
and  its  properties  in  more  detail  shortly. 

Complex  exponentials  are  eigenfunctions  of  LTI  systems:  Recall  that  an  eigenvector  of 
a  matrix  H  is  any  vector  x  that  satishes  Hx  =  Ax.  That  is,  the  matrix  leaves  its  eigenvectors 
unchanged  except  for  a  scale  factor  A,  which  is  the  eigenvalue  associated  with  that  eigenvector. 
In  an  entirely  analogous  fashion,  we  see  that  the  complex  exponential  signal  jg  eigen¬ 

function  of  the  LTI  system  with  impulse  response  h,  with  eigenvalue  H{fo).  Since  we  have  not 
constrained  h,  we  conclude  that  complex  exponentials  are  eigenfunctions  of  any  LTI  system.  We 
shall  soon  see,  when  we  discuss  Fourier  transforms,  that  this  eigenfunction  property  allows  us 
to  characterize  LTI  systems  in  the  frequency  domain,  which  in  turn  enables  powerful  frequency 
domain  design  and  analysis  tools. 


2.3.1  Discrete  time  convolution 

DSP-based  implementations  of  convolutions  are  inherent  discrete  time.  For  two  discrete  time 
sequences  {miH}  and  {u2[n]},  their  convolution  ?/  =  ui  *  M2  is  dehned  analogous  to  continuous 
time  convolution,  replacing  integration  by  summation; 

y[n]  =  E  Ui[k]u2[n  —  k\  (2.38) 

k 

Matlab  implements  this  using  the  “conv”  function.  This  can  be  interpreted  as  ui  being  the  input 
to  a  system  with  impulse  response  U2,  where  a  discrete  time  impulse  is  simply  a  one,  followed  by 
all  zeros. 

Continuous  time  convolution  between  Ui{t)  and  U2(t)  can  be  approximated  using  discrete  time 
convolutions  between  the  corresponding  sampled  signals.  For  example,  for  samples  at  rate  l/T,, 
the  inhnitesimal  dt  is  replaced  by  the  sampling  interval  Tg  as  follows: 

y{t)  =  {ui  *U2){t)  =  j  Ui{T)u2{t  -  T)dT  ^  Ui{kTs)u2{t  -  kTs)Ts 

k 

Evaluating  at  a  sampling  time  t  =  nTg,  we  have 

y{nTs)  =  Ts'^ui{kTs)u2{nTs  -  kTg) 
k 
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Letting  x[n]  =  x{nTs)  denote  the  discrete  time  waveform  corresponding  to  the  nth  sample  for 
each  of  the  preceding  waveforms,  we  have 

y{nTs)  =  y[n]  Ui[k]u2[n  -  k]  =  T^(ni  *U2)[n]  (2.39) 

k 

which  shows  us  how  to  implement  continuous  time  convolution  using  discrete  time  operations. 


Figure  2.16:  Two  signals  and  their  continuous  time  convolution,  computed  in  discrete  time  using 
Code  Fragment  2.3.1. 


The  following  Matlab  code  provides  an  example  of  a  continuous  time  convolution  approximated 
numerically  using  discrete  time  convolution,  and  then  plotted  against  the  original  continuous 
time  index  t,  as  shown  in  Figure  2.16  (cosmetic  touches  not  included  in  the  code  below).  The 
two  waveforms  convolved  are  ui{t)  =  t^/[_i^i](f)  and  U2{t)  =  I[_i^oo)  (the  latter  is  truncated 

in  our  discrete  time  implementation). 

Code  Fragment  2.3.1  (Discrete  time  computation  of  continuous  time  convolution) 

dt=0.01;  7oSampling  interval  T_s 
y.7oFIRST  SIGNAL 

ulstart=-l;  ulend  =  1;  "/oStart  and  end  times  for  first  signal 

tl=ulstart :dt lulend;  7sampling  times  for  first  signal 

ul=tl.''2;  "/odiscrete  time  version  of  first  signal 

7„7SEC0ND  SIGNAL  (exponential  truncated  when  it  gets  small) 

u2start=-l;  u2end  =  10; 

t2=u2start : dt : u2end ; 

u2=exp(-(t2+l)) ; 

7.7oApproximation  of  continuous  time  convolution 

y=dt*conv(ul ,u2) ; 

7.7oPLOT  OF  SIGNALS  AND  THEIR  CONVOLUTION 

ystart=ulstart+u2start ;  7oStart  time  for  convolution  output 
time_axis  =  ystart :dt :ystart+dt*(length(y)-l) ; 
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y.yoPLOT  ul,  u2  and  y 
plot(tl,ul, ’r-. ’) ; 
hold  on; 

plot(t2,u2, ’r: ') ; 
plot (time_axis ,y) ; 

legendC’ul' , 'u2’ , ^y’ , ’Location' , ’NorthEast’) ; 
hold  off ; 


2.3.2  Multi-rate  systems 

While  continuous  time  signals  can  be  converted  to  discrete  time  by  sampling  “fast  enough,”  it  is 
often  required  that  we  operate  at  multiple  sampling  rates.  For  example,  in  digital  communication, 
we  may  send  a  string  of  symbols  {6[n]}  (think  of  these  as  taking  values  +1  or  -1  for  now)  by 
modulating  them  onto  shifted  versions  of  a  pulse  pit)  as  follows: 

«(t)  =  E  h[n]p{t  —  nT)  (2.40) 

n 

where  1/T  is  the  rate  at  which  symbols  are  generated  (termed  the  symbol  rate).  In  order  to 
represent  the  analog  pulse  pit)  as  discrete  time  samples,  we  may  sample  it  at  rate  l/T^,  typically 
chosen  to  be  an  integer  multiple  of  the  symbol  rate,  so  that  T  =  mTg,  where  m  is  a  positive 
integer.  Typical  values  employed  in  transmitter  DSP  modules  might  be  m  =  4  or  m  =  8.  Thus, 
the  system  we  are  interested  is  multi-rate:  waveforms  are  sampled  at  rate  l/Tg  =  m/T,  but 
the  input  is  at  rate  1/T.  Set  u[k]  =  u{kTs)  and  p[k]  =  p{kTs)  as  the  discrete  time  signals 
corresponding  to  samples  of  the  transmitted  waveform  uit)  and  the  pulse  pit),  respectively.  We 
can  write  the  sampled  version  of  (2.40)  as 

u[k]  =  E  h\n\p{kTs  —  nT)  =  —  nm\  (2-41) 

n  n 

The  preceding  almost  has  the  form  of  a  discrete  time  convolution,  but  the  key  difference  is 
that  the  successive  symbols  are  spaced  by  time  T,  which  corresponds  to  m  >  1  samples 

at  the  sampling  rate  l/T*.  Thus,  in  order  to  implement  this  system  using  convolution  at  rate 
l/T^,  we  must  space  out  the  input  symbols  by  inserting  m  —  1  zeros  between  successive  symbols 
h[n],  thus  converting  a  rate  1/T  signal  to  a  rate  l/T^  =  m/T  signal.  This  process  is  termed 
upsampling.  While  the  upsampling  function  is  available  in  certain  Matlab  toolboxes,  we  provide 
a  self-contained  code  fragment  below  that  illustrates  its  use  for  digital  modulation,  and  plots 
the  waveform  obtained  for  symbol  sequence  — 1, -|-1, -fl,  — 1.  The  modulating  pulse  is  a  sine 
pulse:  p{t)  =  sm(7rf/T)/[o,T]5  and  our  convention  is  to  set  T  =  1  without  loss  of  generality 
(or,  equivalently,  to  replace  t  by  t/T).  We  set  the  oversampling  factor  M  =  16  in  order  to 
obtain  smooth  plots,  even  though  typical  implementations  in  communication  transmitters  may 
use  smaller  values. 

Code  Fragment  2.3.2  (Upsampling  for  digital  modulation) 

m=16;  7oSampling  rate  as  multiple  of  symbol  rate 

7odiscrete  time  representation  of  sine  pulse 

time_p  =  0:l/m:l;  7oSampling  times  over  duration  of  pulse 

p  =  sin(pi*time_p)  ;  7oSamples  of  the  pulse 

7oSymbols  to  be  modulated 

symbols  = 
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Figure  2.17:  Digitally  modulated  waveform  obtained  using  Code  Fragment  2.3.2. 


'/.UPSAMPLE  BY  m 

nsymbols  =  length(symbols)  ; '/.length  of  original  symbol  sequence 
nsymbols_upsampled  =  l+(nsymbols-l)  *m; '/.length  of  upsampled  symbol  sequence 
symbols_upsampled  =  zeros (nsymbols_upsampled,  1)  ;'/„ 

symbols_upsampled(l  :m:nsymbols_upsampled)=symbols; '/.insert  symbols  with  spacing  M 
'/.GENERATE  MODULATED  SIGNAL  BY  DISCRETE  TIME  CONVOLUTION 
u=conv(symbols_upsampled,p) ; 

'/.PLOT  MODULATED  SIGNAL 

time_u  =  0: 1/m:  (length(u)-l)/m;  '/.unit  of  time  =  symbol  time  T 
plot (time_u,u) ; 
xlabeK ’t/T’ ) ; 


2.4  Fourier  Series 

Fourier  series  represent  periodic  signals  in  terms  of  sinusoids  or  complex  exponentials.  A  signal 
u{t)  is  periodic  with  period  T  if  u(t  +  T)  =  u(t)  for  all  t.  Note  that,  if  u  is  periodic  with  period 
T,  then  it  is  also  periodic  with  period  nT,  where  n  is  any  positive  integer.  The  smallest  time 
interval  for  which  u{t)  is  periodic  is  termed  the  fundamental  period.  Let  us  denote  this  by  Tq,  and 
dehne  the  corresponding  fundamental  frequency  /o  =  I/Tq  (measured  in  Hertz  if  Tq  is  measured 
in  seconds).  It  is  easy  to  show  that  if  u{t)  is  periodic  with  period  T,  then  T  must  be  an  integer 
multiple  of  Tq.  In  the  following,  we  often  simply  refer  to  the  fundamental  period  as  “period.” 

Using  mathematical  machinery  beyond  our  current  scope,  it  can  be  shown  that  any  periodic  signal 
with  period  Tq  (subject  to  mild  technical  conditions)  can  be  expressed  as  a  linear  combination 
of  complex  exponentials 


,  m  =  0,  ±1,  ±2, ... 


whose  frequencies  are  integer  multiples  of  the  fundamental  frequency  /q.  That  is,  we  can  write 


(2.42) 


n=—oo 


n=—oo 
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The  coefficients  {un}  are  in  general  complex- valued,  and  are  called  the  Fourier  series  for  u{t). 
They  can  be  computed  as  follows: 


— 


Tr 


0  JTo 


(2.43) 


where  denotes  an  integral  over  any  interval  of  length  Tq. 


Let  us  now  derive  (2.43).  For  m  a  nonzero  integer,  consider  an  arbitrary  interval  of  length  Tq,  of 
the  form  [D,  D  +  To],  where  the  offset  D  is  free  to  take  on  any  real  value.  Then,  for  any  nonzero 
integer  m  7^  0,  we  have 


jD+To 


o  ft 

gj27rm/ot  ^ 

j27rm/o  ^ 


(2.44) 


gj27r/omD_gj(27rm/oT>  +  27rm) 

j27rm/o 


0 


since  =  1.  Thus,  when  we  multiply  both  sides  of  (2.42)  by  and  integrate  over  a 

period,  all  terms  corresponding  to  n  ^  k  drop  out  by  virtue  of  (2.44),  and  we  are  left  only  with 
the  n  =  k  term; 


=  ^  jD+T,  ^J2n{n-k)f0t^^  ^  +  0 

which  proves  (2.43). 

We  denote  the  Fourier  series  relationship  (2.42)-(2.43)  as  u{t)  f-)-  {«„}.  It  is  useful  to  keep 
in  mind  the  geometric  meaning  of  this  relationship.  The  space  of  periodic  signals  with  period 
Tq  =  can  be  thought  of  in  the  same  way  as  the  finite-dimensional  vector  spaces  we  are  familiar 
with,  except  that  the  inner  product  between  two  periodic  signals  is  given  by 


{u,v)to=  /  u(t)v*(t)dt 
Jto 

The  energy  over  a  period  for  a  signal  u  is  given  by  UmIItq  =  {u,u)to,  where  ||M||ro  denotes  the 
norm  computed  over  a  period.  We  have  assumed  that  the  Fourier  basis  {'ipnit)}  spans  this  vector 
space,  and  have  computed  the  Fourier  series  after  showing  that  the  basis  is  orthogonal: 

(V’n,'0m)To  =  0  ,  n^m 


and  equal  energy: 

IIV^’^IIto  “  {dPni'4^n)TQ  —  Tq 

The  computation  of  the  expression  for  the  Fourier  series  {uk}  can  be  rewritten  in  these  vector 
space  terms  as  follows.  A  periodic  signal  u{t)  can  be  expanded  in  terms  of  the  Fourier  basis  as 

CX> 

U{t)  =  ^  Un'tpn{t)  (2.45) 

n=—OQ 


Using  the  orthogonality  of  the  basis  functions,  we  have 

{u,'lljk)To  =  '^Un{'^IJn,'4’k)To  =  U  kW'^jj  k\\‘^ 

n 
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That  is, 


(2.46) 


_  {u,i>k)To  _  {u,i>k)To 

\m\^  ~  To 

In  general,  the  Fourier  series  of  an  arbitrary  periodic  signal  may  have  an  inhnite  number  of  terms. 
In  practice,  one  might  truncate  the  Fourier  series  at  a  hnite  number  of  terms,  with  the  number 
of  terms  required  to  provide  a  good  approximation  to  the  signal  depending  on  the  nature  of  the 
signal. 


Figure  2.18:  Square  wave  with  period  Tq. 


Example  2.4.1  Fourier  series  of  a  square  wave:  Consider  the  periodic  waveform  u{t)  as 
shown  in  Figure  2.18.  For  k  =  0,  we  get  the  DC  value  Uq  =  Por  k  0,  we  have, 

using  (2.43),  that 


^  J-lA  Amine  +  %  Iq^  Amaxe  j27rfci/To^^ 


^  Amin 

To  —  j27rfc/To 


+ 


Amaa: 

To  — j27rfc/To 


Zb 

2 

0 


Amin{l-e^^>‘)+Ama^  (g-J^fc-l) 
—j27Tk 


For  k  even,  =  e  =  1,  which  yields  Uk  =  0.  That  is,  there  are  no  even  harmonics.  For  k 
odd,  =  —  1,  which  yields  Uk  =  .  We  therefore  obtain 


Uk  = 


0, 

-^min 

jirk  ' 


k  even 
k  odd 


Combining  the  terms  for  positive  and  negative  fc,  we  obtain 


u{t)  = 


-^raax  H“  -^min 


E 

k  odd 


2(41. 


41. 


irk 


■  sin27r/ct/To 


Example  2.4.2  Fourier  series  of  an  impulse  train:  Even  though  the  delta  function  is  not 
physically  realizable,  the  Fourier  series  of  an  impulse  train,  as  shown  in  Figure  2.19  turns  out  to 
be  extremely  useful  in  theoretical  development  and  in  computations.  Specihcally,  consider 

CX) 

u{t)  =  5{t  —  nTo) 

n=—oo 
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^0  ^  ^0 
Figure  2.19:  An  impulse  train  of  period  Tq. 


By  integrating  over  an  interval  of  length  Tq  centered  around  the  origin,  we  obtain 

1  r—  1  r—  1 

uj,  =  —  I  =  —  I  =  — 

To  ^  ^  ^0 

using  the  sifting  property  of  the  impulse.  That  is,  the  delta  function  has  equal  frequency  content 
at  all  harmonics.  This  is  yet  another  manifestation  of  the  physical  unrealizability  of  the  impulse: 
for  well-behaved  signals,  the  Fourier  series  should  decay  as  the  frequency  increases. 


While  we  have  considered  signals  which  are  periodic  functions  of  time,  the  concept  of  Fourier 
series  applies  to  periodic  functions  in  general,  whatever  the  physical  interpretation  of  the  argu¬ 
ment  of  the  function.  In  particular,  as  we  shall  see  when  we  discuss  the  effect  of  time  domain 
sampling  in  the  context  of  digital  communication,  the  time  domain  samples  of  a  waveform  can 
be  interpreted  as  the  Fourier  series  for  a  particular  periodic  function  of  frequency. 


2.4.1  Fourier  Series  Properties  and  Applications 

We  now  state  some  Fourier  series  properties  which  are  helpful  both  for  computation  and  for 
developing  intuition.  The  derivations  are  omitted,  since  they  follow  in  a  straightforward  manner 
from  (2.42)-(2.43),  and  are  included  in  any  standard  text  on  signals  and  systems.  In  the  following, 
M(f),  v{t)  denote  periodic  waveforms  of  period  Tq  and  Fourier  series  {ufc},  {vk}  respectively. 

Linearity:  For  arbitrary  complex  numbers  «,  /3, 

au{t)  +  (3v{t)  t-)-  {aUk  +  Pvk} 

Time  delay  corresponds  to  linear  phase  in  frequency  domain: 

The  Fourier  series  of  a  real-valued  signal  is  conjugate  symmetric:  If  u{t)  is  real-valued, 
then  Uk  =  u*_i^. 

Harmonic  structure  of  real-valued  periodic  signals:  While  both  the  Fourier  series  coef- 
hcients  and  the  complex  exponential  basis  functions  are  complex- valued,  for  real- valued  u{t), 
the  linear  combination  on  the  right-hand  side  of  (2.42)  must  be  real- valued.  In  particular,  as 
we  show  below,  the  terms  corresponding  to  Uk  and  u^k  {k  >  1)  combine  together  into  a  real¬ 
valued  sinusoid  which  we  term  the  kth  harmonic.  Specihcally,  writing  Uk  =  AkC^'^^  in  polar 
form,  we  invoke  the  conjugate  symmetry  of  the  Fourier  series  for  real- valued  u{t)  to  infer  that 
=  u\  =  Ake~^'^'^.  The  Fourier  series  can  therefore  be  written  as 

CXD  OO 

k=l  k=l 
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This  yields  the  following  Fourier  series  in  terms  of  real-valued  sinusoids: 

OO  OO 

u{t)  =  Mo  +  ^ cos(27rfc/of  (t)k)  =  Uo  +  '^2\uk  \  cos  {2'Kkfot  +  (2.47) 

k=\  k=\ 


Differentiation  amplifies  higher  frequencies: 

x{t)  =  ^u{t)  ^  Xk=  j27ikfoUk  (2.48) 

Note  that  differentiation  kills  the  DC  term,  i,.e,  xq  =  0.  However,  the  information  at  all  other 
frequencies  is  preserved.  That  is,  if  we  know  then  we  can  recover  {uk,  /c  7^  0}  as  follows: 

Uk  =  k  ^  0  (2.49) 

j27r/o/c 

This  is  a  useful  property,  since  differentiation  often  makes  Fourier  series  easier  to  compute. 


^max 


^min 


d/dt 


A  _A 
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•  •  • 
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T 
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Figure  2.20:  The  derivative  of  a  square  wave  is  two  interleaved  impulse  trains. 


Example  2.4.1  redone  (using  differentiation  to  simplify  Fourier  series  computation): 

Differentiating  the  square  wave  in  Figure  2.18  gives  us  two  interleaved  impulse  trains,  one  cor¬ 
responding  to  the  upward  edges  of  the  rectangular  pulses,  and  the  other  to  the  downward  edges 
of  the  rectangular  pulses,  as  shown  in  Figure  2.20. 


^{t')  i^max 


Hi  -  kTo)  kTo  -  To/2) 


Compared  to  the  impulse  train  in  Example  2.4.2,  the  first  impulse  train  above  is  offset  by  0, 
while  the  second  is  offset  by  To/2  (and  inverted).  We  can  therefore  infer  their  Fourier  series 
using  the  time  delay  property,  and  add  them  up  by  linearity,  to  obtain 


A. 


-A 


Ar, 


-  A. 


Xk  = 


min  ^-j2nfokTo/2  ^ 


Ar. 


-Ar 


(1 


,  fc  7^  0 
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Using  the  differentiation  property,  we  can  therefore  infer  that 


j'iirfok 


-^max  -^min 

-j2'KkfoTQ 


(1  - 


which  gives  us  the  same  result  as  before.  Note  that  the  DC  term  uq  cannot  be  obtained  using 
this  approach,  since  it  vanishes  upon  differentiation.  But  it  is  easy  to  compute,  since  it  is  just 
the  average  value  of  u{t),  which  can  be  seen  to  be  uq  =  {Amax  +  Amin)/'^  by  inspection. 

In  addition  to  simplifying  computation  for  waveforms  which  can  be  described  (or  approximated) 
as  polynomial  functions  of  time  (so  that  enough  differentiation  ultimately  reduces  them  to  im¬ 
pulse  trains),  the  differentiation  method  explicitly  reveals  how  the  harmonic  structure  (i.e.,  the 
strength  and  location  of  the  harmonics)  of  a  periodic  waveform  is  related  to  its  transitions  in 
the  time  domain.  Once  we  understand  the  harmonic  structure,  we  can  shape  it  by  appropriate 
hltering.  For  example,  if  we  wish  to  generate  a  sinusoid  of  frequency  300  MHz  using  a  digital 
circuit  capable  of  generating  symmetric  square  waves  of  frequency  100  MHz,  we  can  choose  a 
hlter  to  isolate  the  third  harmonic.  However,  we  cannot  generate  a  sinusoid  of  frequency  200 
MHz  (unless  we  make  the  square  wave  suitably  asymmetric),  since  the  even  harmonics  do  not 
exist  for  a  symmetric  square  wave  (i.e.,  a  square  wave  whose  high  and  low  durations  are  the 
same) . 

Parseval’s  identity  (periodic  inner  prodnct/power  can  be  computed  in  either  time 
or  frequency  domain):  Using  the  orthogonality  of  complex  exponentials  over  a  period,  it  can 
be  shown  that 

«  CX) 

{u,v)tq=  /  u{t)v* {t)dt  =  To  ^  Ukvl  (2.50) 

“^^0  k=-oo 

Setting  V  =  u,  and  dividing  both  sides  by  Tq,  the  preceding  specializes  to  an  expression  for  signal 
power  (which  can  be  computed  for  a  periodic  signal  by  averaging  over  a  period): 


tI 

to  Jto 


(2.51) 


2.5  Fourier  Transform 

We  dehne  the  Fourier  transform  U{f)  for  a  aperiodic,  hnite  energy  waveform  u{t)  as 

/OO 

u{t)e~^^'^^^dt  ,  —  OO  <  /  <  OO  Fourier  Transform  (2.52) 

■C» 

The  inverse  Fourier  transform  is  given  by 

/OO 

U{f)e^^-f^df  ,  —  OO  <  t  <  OO  Inverse  Fourier  Transform  (2.53) 

•OO 


The  inverse  Fourier  transform  tells  us  that  any  hnite  energy  signal  can  be  written  as  a  linear  com¬ 
bination  of  a  continuum  of  complex  exponentials,  with  the  coefficients  of  the  linear  combination 
given  by  the  Fourier  transform  U{f). 

Notation:  We  call  a  signal  and  its  Fourier  transform  a  Fourier  transform  pair,  and  denote  them 
as  u{t)  -H-  U{f).  We  also  denote  the  Fourier  transform  operation  by  so  that  U{f)  = 


51 


Example  2.5.1  Rectangular  pulse  and  sine  function  form  a  Fourier  transform  pair: 

Consider  the  rectangular  pulse  u{t)  =  /[-t/2,t/2](^)  of  duration  T.  Its  Fourier  transform  is  given 
by 

U{f)  = 


g-j^irft  7'/2  _  _^jnfT 

-TI2  ~  -j2iTf 


=  =  Tsinc(/T) 

We  denote  this  as 

-f[-T/2,T/2](f)  -H-  Tsinc(/T) 


Duality:  Given  the  similarity  of  the  form  of  the  Fourier  transform  (2.52)  and  inverse  Fourier 
transform  (2.53),  we  can  see  that  the  roles  of  time  and  frequency  can  be  switched  simply  by 
negating  one  of  the  arguments.  In  particular,  suppose  that  u{t)  ■H-  U{f).  Dehne  the  time 
domain  signal  s{t)  =  U{t),  replacing  /  by  t.  Then  the  Fourier  transform  of  s{t)  is  given  by 
S{f)  =  u{—f),  replacing  t  by  — /.  Since  negating  the  argument  corresponds  to  reflection  around 
the  origin,  we  can  simply  switch  time  and  frequency  for  signals  which  are  symmetric  around  the 
origin.  Applying  duality  to  the  Example  2.5.1,  we  infer  that  a  signal  that  is  ideally  bandlimited 
in  frequency  corresponds  to  a  sine  function  in  time: 

I[-w/2,w/2]if)  hFsinc(hFt) 

Application  to  infinite  energy  signals:  In  engineering  applications,  we  routinely  apply  the 
Fourier  and  inverse  Fourier  transform  to  inhnite  energy  signals,  even  though  its  derivation  as  the 
limit  of  a  Fourier  series  is  based  on  the  assumption  that  the  signal  has  hnite  energy.  While  inhnite 
energy  signals  are  not  physically  realizable,  they  are  useful  approximations  of  hnite  energy  signals, 
often  simplifying  mathematical  manipulations.  For  example,  instead  of  considering  a  sinusoid 
over  a  large  time  interval,  we  can  consider  a  sinusoid  of  inhnite  duration.  As  we  shall  see,  this 
leads  to  an  impulsive  function  in  the  frequency  domain.  As  another  example,  delta  functions 
in  the  time  domain  are  useful  in  modeling  the  impulse  response  of  wireless  multipath  channels. 
Basically,  once  we  are  willing  to  work  with  impulses,  we  can  use  the  Fourier  transform  on  a  very 
broad  class  of  signals. 


Example  2.5.2  The  delta  function  and  the  constant  fnnetion  form  a  Fonrier  trans¬ 
form  pair:  For  u{t)  =  S{t),  we  have 

/CXD 

=  1 

•OO 


for  all  /.  That  is. 


6(t)  -H-  /(-oo,oo)(/) 


Now  that  we  have  seen  both  the  Fourier  series  and  the  Fourier  transform,  it  is  worth  commenting 
on  the  following  frequently  asked  questions. 

What  do  negative  freqnencies  mean?  Why  do  we  need  them?  Consider  a  real-valued 
sinusoid  Acos(27r/of  -|-  9),  where  /o  >  0.  If  we  now  replace  /o  by  —fo,  we  obtain  Acos(— 27r/ot  -|- 
9)  =  A  cos(27r/ot— 6^),  using  the  fact  that  cosine  is  an  even  function.  Thus,  we  do  not  need  negative 
frequencies  when  working  with  real- valued  sinusoids.  However,  unlike  complex  exponentials,  real¬ 
valued  sinusoids  are  not  eigenfunctions  of  LTI  systems:  we  can  pass  a  cosine  through  an  LTI 
system  and  get  a  sine,  for  example.  Thus,  once  we  decide  to  work  with  a  basis  formed  by  complex 
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exponentials,  we  do  need  both  positive  and  negative  frequencies  in  order  to  describe  all  signals 
of  interest.  For  example,  a  real-valued  sinusoid  can  be  written  in  terms  of  complex  exponentials 
as 


so  that  we  need  complex  exponentials  at  both  -|-/o  and  — /o  to  describe  a  real-valued  sinusoid 
at  frequency  /q.  Of  course,  the  coefficients  multiplying  these  two  complex  exponentials  are  not 
arbitrary:  they  are  complex  conjugates  of  each  other.  More  generally,  as  we  have  already  seen, 
such  conjugate  symmetry  holds  for  both  Fourier  series  and  Fourier  transforms  of  real- valued 
signals.  We  can  therefore  state  the  following: 

(a)  We  do  need  both  positive  and  negative  frequencies  to  form  a  complete  basis  using  complex 
exponentials; 

(b)  For  real- valued  (i.e.,  physically  realizable)  signals,  the  expansion  in  terms  of  a  complex 
exponential  basis,  whether  it  is  the  Fourier  series  or  the  Fourier  transform,  exhibits  conjugate 
symmetry.  Hence,  we  only  need  to  know  the  Fourier  series  or  Fourier  transform  of  a  real-valued 
signal  for  positive  frequencies. 


A  cos(27r/of  -h  0)  = 


J(2TTfot+0)  _|_  g-j(27r/ot-|- 


2.5.1  Fourier  Transform  Properties 

The  Fourier  transform  can  be  obtained  by  taking  the  limit  of  the  Fourier  series  as  the  period 
gets  large,  with  Tq  — ?■  cxd  and  /o  — t  0  (think  of  an  aperiodic  signal  as  periodic  with  inhnite 
period).  We  do  not  provide  details,  but  sketch  the  process  of  taking  this  limit:  T^Uk  tends  to 
U{f),  where  /  =  kfo,  and  the  Fourier  series  sum  in  (2.42)  become  the  inverse  Fourier  transform 
integral  in  (2.53),  with  /o  becoming  df.  Not  surprisingly,  therefore,  the  Fourier  transform  exhibits 
properties  entirely  analogous  to  those  for  Fourier  series.  However,  the  Fourier  transform  applies 
to  a  broader  class  of  signals,  and  we  can  take  advantage  of  time-frequency  duality  more  easily, 
because  both  time  and  frequency  are  now  continuous-valued  variables. 

We  now  state  some  key  properties.  In  the  following,  u(t),  v(t)  denote  signals  with  Fourier 
transforms  U{f),  V{f),  respectively. 

Linearity:  For  arbitrary  complex  numbers  a,  /?, 

au{t)  -f  i3v{t)  y-)-  aU{f)  +  PV{f) 

Time  delay  corresponds  to  linear  phase  in  frequency  domain: 

u{t  -  to)  yy  f/(/)e-^'2-^‘° 

Frequency  shift  corresponds  to  modulation  by  complex  exponential: 

U{f  -  /o)  ^  n(t)e^-2"*^ 

The  Fourier  transform  of  a  real-valued  signal  is  conjugate  symmetric:  If  u{t)  is  real¬ 
valued,  then  U{f)  =  U*{—f). 

Differentiation  in  the  time  domain  amplifies  higher  frequencies: 

x{t)  =  ^M(t)  yy  X(/)  =  j27rfU{f) 

As  for  Fourier  series,  differentiation  kills  the  DC  term,  i,.e,  W(0)  =  0.  However,  the  information 
at  all  other  frequencies  is  preserved.  Thus,  if  we  know  X{f)  then  we  can  recover  U{f)  for  /  7^  0 
as  follows: 

u{f)  =  NT,  /  0  (2.54) 

J27r/ 
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This  specifies  the  Fourier  transform  almost  everywhere  (except  at  DC:  /  =  0).  If  U{f)  is  finite 
everywhere,  then  we  do  not  need  to  worry  about  its  value  at  a  particular  point,  and  can  leave 
f/(0)  unspecified,  or  define  it  as  the  limit  of  (2.54)  as  /  — )■  0  (and  if  this  limit  does  not  exist, 
we  can  set  U{0)  to  be  the  left  limit,  or  the  right  limit,  or  any  number  in  between).  In  short,  we 
can  simply  adopt  (2.54)  as  the  expression  for  U{f)  for  all  /,  when  17(0)  is  finite.  However,  the 
DC  term  does  matter  when  u{t)  has  a  nonzero  average  value,  in  which  case  we  get  an  impulse 
at  DC.  The  average  value  of  u(t)  is  given  by 

T 

1  r  2 

u  =  lim  —  /  u(t)dt 
T^oo  Tit 
-2 

and  has  Fourier  transform  given  by  u{t)  =  F  -H-  u5{f).  Thus,  we  can  write  the  overall  Fourier 
transform  as 

UU)  =  +  '^Kf)  (2.56) 

We  illustrate  this  via  the  following  example. 


Example  2.5.3  (Fourier  transform  of  a  step  function)  Let  us  use  differentiation  to  com¬ 
pute  the  Fourier  transform  of  the  unit  step  function 


u{t) 


0,  t  <  0 
1,  f  >  0 


Its  DC  value  is  given  by 

u  =  1/2 

and  its  derivative  is  the  delta  function  (see  Figure  2.21): 

^  ^  ^  =  1 

Applying  (2.55),  we  obtain  that  the  Fourier  transform  of  the  unit  step  function  is  given  by 

du/dt 


Figure  2.21:  The  unit  step  function  and  its  derivative,  the  delta  function. 


W)  =  ^  +  i5(/) 


Parseval’s  identity  (inner  product/energy  can  be  computed  in  either  time  or  fre¬ 
quency  domain): 

poo  poo 

{u,v)=  /  u{t)v*{t)dt=  /  U{f)V*{f)df 
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Setting  V  =  u,  we  get  an  expression  for  the  energy  of  a  signal: 


\u\\  = 


\u{t)\^dt=  /  \U{f)\^df 


Next,  we  discnss  the  signihcance  of  the  Fourier  transform  in  understanding  the  effect  of  LTI 
systems. 

Transfer  function  for  an  LTI  system:  The  transfer  function  H{f)  of  an  LTI  system  is 
dehned  to  be  the  Fourier  transform  of  its  impulse  response  h(t).  That  is,  H{f)  =  We 

now  discuss  its  signihcance. 

From  (2.37),  we  know  that,  when  the  input  to  an  LTI  system  is  the  complex  exponential 
the  output  is  given  by  From  the  inverse  Fourier  transform  (2.53),  we  know  that 

any  input  u{t)  can  be  expressed  as  a  linear  combination  of  complex  exponentials.  Thus,  the 
corresponding  response,  which  we  know  is  given  by  y{t)  =  {u*h){t)  must  be  a  linear  combination 
of  the  responses  to  these  complex  exponentials.  Thus,  we  have 

/CX) 

■CO 


We  recognize  that  the  preceding  function  is  in  the  form  of  an  inverse  Fourier  transform,  and 
read  off  T(/)  =  U{f)H{f).  That  is,  the  Fourier  transform  of  the  output  is  simply  the  product 
of  the  Fourier  transform  of  the  input  and  the  system  transfer  function.  This  is  because  complex 
exponentials  at  different  frequencies  propagate  through  an  LTI  system  without  mixing  with  each 
other,  with  a  complex  exponential  at  frequency  /  passing  through  with  a  scaling  of  H{f). 

Of  course,  we  have  also  derived  an  expression  for  y{t)  in  terms  of  a  convolution  of  the  input 
signal  with  the  system  impulse  response:  y{t)  =  [u  *  h){t).  We  can  now  infer  the  following  key 
property. 

Convolution  in  the  time  domain  corresponds  to  multiplication  in  the  frequency  do¬ 
main 

y{t)  =  {u  *  h)it)  ^  Y{f)  =  U{f)H{f)  (2.56) 


We  can  also  infer  the  following  dual  property,  either  by  using  duality  or  by  directly  deriving  it 
from  hrst  principles. 

Multiplication  in  the  time  domain  corresponds  to  convolution  in  the  frequency  do¬ 
main 

y{t)  =  u{t)v{t)  o  Y{f)  =  {U  *  V){f)  (2.57) 

LTI  system  response  to  real-valued  sinusoidal  signals:  For  a  sinusoidal  input  u{t)  = 
cos(27r/of  +  9),  the  response  of  an  LTI  system  h  is  given  by 


y(t)  =  {u*  h){t)  =  \H{fo)\  cos(27r/ot  +  9  +  /i7(/o)) 

This  can  be  inferred  from  what  we  know  about  the  response  for  complex  exponentials,  thanks 
to  Euler’s  formula.  Specihcally,  we  have 


—  1  ^gt(27r/ot+6»)  g-i(27r/ot+6») j  ^j2iTfot  ^-j2TTfot 


When  u  goes  through  an  LTI  system  with  transfer  function  H{f),  the  output  is  given  by 

y{t)  =  +  le-^'^i7(-/o)e-^'2"*‘ 
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If  the  system  is  physically  realizable,  the  impulse  response  h{t)  is  real-valued,  and  the  transfer 
function  is  conjugate  symmetric.  Thus,  if  H{fo)  =  {G  >  0),  then  hr(— /o)  =  H*{fo)  = 

Ge~^‘^.  Substituting,  we  obtain 

y{t)  =  +  ^g-t(2-/oi+e+<^)  =  G'cos(27r/ot  +  0  +  0) 

This  yields  the  well-known  result  that  the  sinusoid  gets  scaled  by  the  magnitude  of  the  transfer 
function  G  =  \H{fo)\,  and  gets  phase  shifted  by  the  phase  of  the  transfer  function  0  =  /H{fo). 

Example  2.5.4  (Delay  spread,  coherence  bandwidth,  and  fading  for  a  multipath 
channel)  The  transfer  function  of  a  multipath  channel  as  in  (2.36)  is  given  by 

H{f)  =  +  ...  +  (2.58) 

Thus,  the  channel  transfer  function  is  a  linear  combination  of  complex  exponentials  in  the  fre¬ 
quency  domain.  As  with  any  sinusoids,  these  can  interfere  constructively  or  destructively,  leading 
to  significant  fluctuations  in  H{f)  as  /  varies.  For  wireless  channels,  this  phenomenon  is  called 
frequency-selective  fading.  Let  us  examine  the  structure  of  the  fading  a  little  further.  Suppose, 
without  loss  of  generality,  that  the  delays  are  in  increasing  order  (i.e.,  ri  <  r2  <  ...  <  r^).  We 
can  then  rewrite  the  transfer  function  as 


H{f)  = 

k=l 

The  hrst  term  corresponds  simply  to  a  pure  delay  ri  (seen  by  all  frequencies),  and  can 

be  dropped  (taking  ti  as  our  time  origin,  without  loss  of  generality),  so  that  the  transfer  function 
can  be  rewritten  as 

m 

H{f)  =  ai  +  J2  (2.59) 

k=2 

The  period  of  the  kth  sinusoid  above  {k  >  2)  is  l/(rfc  —  ri),  so  that,  the  smallest  period,  and 
hence  the  fastest  fluctuations  as  a  function  of  /,  occurs  because  of  the  largest  delay  difference 
Td  =  Tm  —  Ti,  which  we  call  the  channel  delay  spread.  Thus,  for  a  frequency  interval  which  is 
signihcantly  smaller  than  l/r^,  the  variation  of  \H{f)\  over  the  interval  is  small.  We  dehne  the 
channel  coherence  bandwidth  as  the  inverse  of  the  delay  spread,  i.e.,  as  =  l/{Tm  —  Ti)  = 
l/xd  (this  definition  is  not  unique,  but  in  general,  the  coherence  bandwidth  is  defined  to  be 
inversely  proportional  to  some  appropriately  dehned  measure  of  the  channel  delay  spread).  As 
we  have  noted,  H{f)  can  be  well  modeled  as  constant  over  intervals  signihcantly  smaller  than 
the  coherence  bandwidth. 

Let  us  apply  this  to  the  example  in  Figure  2.14,  where  we  have  a  multipath  channel  with  impulse 
response  h(t)  =  6(t  —  1)  —  0.5(5(t  —  1.5)  +  0.5S(t  —  3.5).  Dropping  the  hrst  delay  as  before,  we 
have 

H{f)  =  1  -  0.5e-^W  + 

For  concreteness,  suppose  that  time  is  measured  in  microseconds  (typical  numbers  for  an  outdoor 
wireless  cellular  link),  so  that  frequency  is  measured  in  MHz.  The  delay  spread  is  2.5/is,  hence 
the  coherence  bandwidth  is  400KHz.  We  therefore  ballpark  the  size  of  the  frequency  interval 
over  which  H{f)  can  be  approximated  as  constant  to  about  40KHz  (i.e.,  of  size  10%  of  the 
coherence  bandwidth).  Note  that  this  is  a  very  fuzzy  estimate:  if  the  larger  delays  occur  with 
smaller  relative  amplitudes,  as  is  typical,  then  they  have  a  smaller  ehect  on  H{f),  and  we  could 
potentially  approximate  H{f)  as  constant  over  a  larger  fraction  of  the  coherence  bandwidth. 
Figure  2.22  depicts  the  fluctuations  in  H{f)  first  on  a  linear  scale,  and  then  on  a  log  scale.  A 
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plot  of  the  transfer  function  magnitude  is  shown  in  Figure  2.22(a).  This  is  the  amplitude  gain  on 
a  linear  scale,  and  shows  signihcant  variations  as  a  function  of  /  (while  we  do  not  show  it  here, 
zooming  in  to  40  KHz  bands  shows  relatively  small  fluctuations).  The  amount  of  fluctuation 
becomes  even  more  apparent  on  a  log  scale.  Interpreting  the  gain  at  the  smallest  delay  {ai  =  1 
in  our  case)  as  that  of  a  nominal  channel,  the  fading  gain  is  dehned  as  the  power  gain  relative 
to  this  nominal,  and  is  given  by  20 log^o(l-^(/)l/l®il)  decibels  (dB).  This  is  shown  in  Figure 
2.22(b).  Note  that  the  fading  gain  can  dip  below  -18  dB  in  our  example,  which  we  term  a  fade 
of  depth  18  dB.  If  we  are  using  a  “narrowband”  signal  which  has  a  bandwidth  small  compared  to 
the  coherence  bandwidth,  and  happen  to  get  hit  by  such  a  fade,  then  we  can  expect  much  poorer 
performance  than  nominal.  To  combat  this,  one  must  use  diversity.  For  example,  a  “wideband” 
signal  whose  bandwidth  is  larger  than  the  coherence  bandwidth  provides  frequency  diversity, 
while,  if  we  are  constrained  to  use  narrowband  signals,  we  may  need  to  introduce  other  forms  of 
diversity  (e.g.,  antenna  diversity  as  in  Software  Lab  2.2). 


(a)  Transfer  Function  Magnitude  (linear  scale) 


Figure  2.22:  Multipath  propagation  causes  severe  frequency-selective  fading. 


2.5.2  Numerical  computation  using  DFT 

In  many  practical  settings,  we  do  not  have  nice  analytical  expressions  for  the  Fourier  or  in¬ 
verse  Fourier  transforms,  and  must  resort  to  numerical  computation,  typically  using  the  discrete 
Fourier  transform  (DFT).  The  DFT  of  a  discrete  time  sequence  {u[n],n  =  0, ...,  iV  —  1}  of  length 
N  is  given  by 

N-l 

[/[m]  =  ,  m  =  0, 1, ...,  iV  -  1.  (2.60) 

n=0 

Matlab  is  good  at  doing  DFTs.  When  iV  is  a  power  of  2,  the  DFT  can  be  computed  very 
efficiently,  and  this  procedure  is  called  a  Fast  Fourier  Transform  (FFT).  Comparing  (2.60)  with 
the  Fourier  transform  expression 


mn 


(2.61) 


we  can  view  the  sum  in  the  DFT  (2.60)  as  an  approximation  for  the  integral  in  (2.61)  under  the 
right  set  of  conditions.  Let  us  hrst  assume  that  u(t)  =  0  for  t  <  0:  any  waveform  which  can  be 
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truncated  so  that  most  of  its  energy  falls  in  a  finite  interval  can  be  shifted  so  that  this  is  true. 
Next,  suppose  that  we  sample  the  waveform  with  spacing  ts  to  get 

u[n\  =  u{nts) 

Now,  suppose  we  want  to  compute  the  Fourier  transform  U{f)  for  /  =  mfs,  where  fs  is  the 
desired  frequency  resolution.  We  can  approximate  the  integral  for  the  Fourier  transform  by  a 
sum,  using  fg-spaced  time  samples  as  follows: 

/OO 

U(^^y-j2nmfstd^  ^ 

n 

{dt  in  the  integral  is  replaced  by  the  sample  spacing  tg.)  Since  u[n]  =  u{nts),  the  approximation 
can  be  computed  using  the  DFT  formula  (2.60)  as  follows: 

U{mfs)  ^  tsU[m]  (2.62) 

as  long  as  fgtg  =  That  is,  using  a  DFT  of  length  N,  we  can  get  a  frequency  granularity  of 
fs  =  This  implies  that  if  we  choose  the  time  samples  close  together  (in  order  to  represent 
u(t)  accurately),  then  we  must  also  use  a  large  N  to  get  a  desired  frequency  granularity.  Often 
this  means  that  we  must  pad  the  time  domain  samples  with  zeros. 

Another  important  observation  is  that,  while  the  DFT  in  (2.60)  ranges  from  m  =  0, ...,  iV  —  1,  it 
actually  computes  the  Fourier  transform  for  both  positive  and  negative  frequencies.  Noting  that 
^j2nmn/N  _  ^j27T{-N+m)n/N ^  realize  that  the  DFT  values  for  m  =  N/2,...,N  —  1  correspond 
to  the  Fourier  transform  evaluated  at  frequencies  (m  —  N)fs  =  —N/2fs,...,—fs-  The  DFT 
values  for  m  =  0,  ...,iV/2  —  1  correspond  to  the  Fourier  transform  evaluated  at  frequencies 
0,  fs, ...,  iN/2  —  l)fs-  Thus,  we  should  swap  the  left  and  right  halves  of  the  DFT  output  in  order 
to  represent  positive  and  negative  frequencies,  with  DC  falling  in  the  middle.  Matlab  actually 
has  a  function,  fftshift,  that  does  this. 

Note  that  the  DFT  (2.60)  is  periodic  with  period  N,  so  that  the  Fourier  transform  approximation 
(2.62)  is  periodic  with  period  N fs  =  j-.  We  typically  limit  the  range  of  frequencies  over  which 

we  use  the  DFT  to  compute  the  Fourier  transform  to  the  fundamental  period  (~^;  ^)-  This  is 
consistent  with  the  sampling  theorem,  which  says  that  the  sampling  rate  l/tg  must  be  at  least  as 
large  as  the  size  of  the  frequency  band  of  interest.  (The  sampling  theorem  is  reviewed  in  Chapter 
4,  when  we  discuss  digital  modulation.) 


Example  2.5.5  (DFT-based  Fourier  transform  computation)  Suppose  that  we  want  to 
compute  the  Fourier  transform  of  the  sine  pulse  u{t)  =  sin  7rfJ[o,i](t).  The  Fourier  transform  for 
this  can  be  computed  analytically  (see  Problem  2.9)  to  be 


U{f) 


2cos7r/  f 
7r(l  -  4/2) 


(2.63) 


Note  that  U{f)  has  a  0/0  form  at  /  =  1/2,  but  using  L’Hospital’s  rule,  we  can  show  that 
17(1/2)  7^  0.  Thus,  the  hrst  zeros  of  U{f)  are  at  /  =  ±3/2.  This  is  a  timelimited  pulse  and 
hence  cannot  be  bandlimited,  but  U{f)  decays  as  1/P  for  /  large,  so  we  can  capture  most  of 
the  energy  of  the  pulse  within  a  suitably  chosen  hnite  frequency  interval.  Let  us  use  the  DFT  to 
compute  U{f)  over  /G  (—8,8).  This  means  that  we  set  l/{2ts)  =  8,  or  tg  =  1/16,  which  yields 
about  16  samples  over  the  interval  [0, 1]  over  which  the  signal  u{t)  has  support.  Suppose  now 
that  we  want  the  frequency  granularity  to  be  at  least  fs  =  1/160.  Then  we  must  use  a  DFT 
with  N  >  Pr  =  2560  =  Nmin-  In  order  to  efficiently  compute  the  DFT  using  the  FFT,  we 
choose  N  =  4096,  the  next  power  of  2  at  least  as  large  as  Nmin-  Code  fragment  2.5.1  performs 
and  plots  this  DFT.  The  resulting  plot  (with  cosmetic  touches  not  included  in  the  code  below) 
is  displayed  in  Figure  2.23.  It  is  useful  to  compare  this  with  a  plot  obtained  from  the  analytical 
formula  (2.63),  and  we  leave  that  as  an  exercise. 
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Figure  2.23:  Plot  of  magnitude  spectrum  of  sine  pulse  in  Example  2.5.5  obtained  numerically 
using  the  DFT. 


Code  Fragment  2.5.1  Numerical  computation  of  Fourier  transform  using  FFT 
ts=l/16;  "/osampling  interval 

time_interval  =  0:ts:l;  %sampling  time  instants 
7o°/otime  domain  signal  evaluated  at  sampling  instants 

signal_timedomain  =  sin(pi*time_interval) ;  7oSinusoidal  pulse  in  our  example 
fs_desired  =  1/160;  7odesired  frequency  granularity 

Nmin  =  ceil (l/(f s_desired*ts) ) ;  7ominimum  length  DFT  for  desired  frequency  granularity 
7ofor  efficient  computation,  choose  FFT  size  to  be  power  of  2 

Nfft  =  2"  (nextpow2(Nmin) )  7oFFT  size  =  the  next  power  of  2  at  least  as  big  as  Nmin 
7oAlternatively ,  one  could  also  use  DFT  size  equal  to  the  minimum  length 
7oNf  ft=Nmin; 

7onote:  fft  function  in  Matlab  is  just  the  DFT  when  Nfft  is  not  a  power  of  2 
7ofreq  domain  signal  computed  using  DFT 

7offt  function  of  size  Nfft  automatically  zeropads  as  needed 
signal_freqdomain  =  ts*f ft (signal_timedomain,Nf ft) ; 

7offtshift  function  shifts  DC  to  center  of  spectrum 
signal_freqdomain_centered  =  fftshift (signal_freqdomain) ; 
f s=l/(Nfft*ts) ;  7oactual  frequency  resolution  attained 

7oSet  of  frequencies  for  which  Fourier  transform  has  been  computed  using  DFT 
freqs  =  ( (1 : Nfft) -1-Nf ft/2) *fs ; 

7oplot  the  magnitude  spectrum 

plot (freqs , abs (signal_f reqdomain_centered) ) ; 

xlabel ( ’ Frequency ’ ) ; 

ylabeK ’Magnitude  Spectrum’); 


2.6  Energy  Spectral  Density  and  Bandwidth 

Communication  channels  have  frequency- dependent  characteristics,  hence  it  is  useful  to  appro¬ 
priately  shape  the  frequency  domain  characteristics  of  the  signals  sent  over  them.  Furthermore, 
for  wireless  communication  systems,  frequency  spectrum  is  a  particularly  precious  commodity, 
since  wireless  is  a  broadcast  medium  to  be  shared  by  multiple  signals.  It  is  therefore  important 
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to  quantify  the  frequency  occupancy  of  communication  signals.  We  provide  a  first  exposure  to 
these  concepts  here  via  the  notion  of  energy  spectral  density  for  hnite  energy  signals.  These 
ideas  are  extended  to  hnite  power  signals,  for  which  we  can  dehne  the  analogous  concept  of 
power  spectral  density,  in  Chapter  4,  “just  in  time”  for  our  discussion  of  the  spectral  occupancy 
of  digitally  modulated  signals.  Once  we  know  the  energy  or  power  spectral  density  of  a  signal, 
we  shall  see  that  there  are  a  number  of  possible  dehnitions  of  bandwidth,  which  is  a  measure  of 
the  size  of  the  frequency  interval  occupied  by  the  signal. 


u(t)- 


H(f) 

1^ 

A 

f* 

Energy 

Meter 


Eu(0  Af 


Figure  2.24:  Operational  dehnition  of  energy  spectral  density. 


Energy  Spectral  Density:  The  energy  spectral  density  Eu{f)  of  a  signal  u{t)  can  be  dehned 
operationally  as  shown  in  Figure  2.24.  Pass  the  signal  u{t)  through  an  ideal  narrowband  hlter 
with  transfer  function  as  follows: 


«/-(/) 


1.  r-f<f<r  +  f 

0,  else 


The  energy  spectral  density  Eu{f*)  is  dehned  to  be  the  energy  at  the  output  of  the  hlter,  divided 
by  the  width  A/  (in  the  limit  as  A/  — )■  0).  That  is,  the  energy  at  the  output  of  the  hlter  is 
approximately  Eu{f*)Af.  But  the  Fourier  transform  of  the  hlter  output  is 

Y(f)  =  u(f)H{f)  =  I  ^<f<f  +  ¥ 

By  Parseval’s  identity,  the  energy  at  the  output  of  the  hlter  is 

roo  rf*+^ 

/  \YUWdf=  \U{f)f  df  ^  \U{r)f  Af 

J-oo 

assuming  that  U (/)  varies  smoothly  and  A/  is  small  enough.  We  can  now  infer  that  the  energy 
spectral  density  is  simply  the  magnitude  squared  of  the  Fourier  transform: 

EM)  =  V(/)P  (2.64) 

The  integral  of  the  energy  spectral  density  equals  the  signal  energy,  which  is  consistent  with 
Parseval’s  identity. 

The  inverse  Fourier  transform  of  the  energy  spectral  density  has  a  nice  intuitive  interpretation. 
Noting  that  \U{f)\‘^  =  U{f)U*{f)  and  U*{f)  y-)-  let  us  dehne  UMpii)  =  u*{—t)  as  (the 

impulse  response  of)  the  matched  filter  ioi  u{t),  where  the  reasons  for  this  term  will  be  clarihed 

l^(/)P  =  U{f)U*{f)  yy  {u*UMF){r)  =  j  u{t)uMF{r  -  t)dt  ,  . 

=  /  u{t)u*{t  —  T)dt 

where  t  is  a  dummy  variable  for  the  integration,  and  the  convolution  is  evaluated  at  the  time 
variable  r,  which  denotes  the  delay  between  the  two  versions  of  u  being  correlated:  the  extreme 
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right-hand  side  is  simply  the  correlation  of  u  with  itself  (after  complex  conjngation),  evalnated 
at  different  delays  r.  We  call  this  the  autocorrelation  function  of  the  signal  u.  We  have  therefore 
shown  the  following. 

For  a  finite  energy  signal,  the  energy  spectral  density  and  the  autocorrelation  function  form  a 
Fourier  transform  pair. 

Bandwidth:  The  bandwidth  of  a  signal  u{t)  is  loosely  dehned  to  be  the  size  of  the  band 
of  frequencies  occupied  by  U{f).  The  dehnition  is  “loose”  because  the  concept  of  occupancy 
can  vary,  depending  on  the  application,  since  signals  are  seldom  strictly  bandlimited.  One 
possibility  is  to  consider  the  band  over  which  |t/(/)p  is  within  some  fraction  of  its  peak  value 
(setting  the  fraction  equal  to  |  corresponds  to  the  3  dB  bandwidth).  Alternatively,  we  might 
be  interested  in  energy  containment  bandwidth,  which  is  the  size  of  the  smallest  band  which 
contains  a  specihed  fraction  of  the  signal  energy  (for  a  hnite  power  signal,  we  dehne  analogously 
the  power  containment  bandwidth). 

Only  positive  frequencies  count  when  computing  bandwidth  for  physical  (real-valued) 
signals:  For  physically  realizable  (i.e.,  real- valued)  signals,  bandwidth  is  dehned  as  its  occupancy 
of  positive  frequencies,  because  conjugate  symmetry  implies  that  the  information  at  negative  fre¬ 
quencies  is  redundant. 

While  physically  realizable  time  domain  signals  are  real-valued,  we  shall  soon  introduce  complex¬ 
valued  signals  that  have  useful  physical  interpretation,  in  the  sense  that  they  have  a  well-dehned 
mapping  to  physically  realizable  signals.  Conjugate  symmetry  in  the  frequency  domain  does  not 
hold  for  complex-valued  time  domain  signals,  with  different  information  contained  in  positive 
and  negative  frequencies  in  general.  Thus,  the  bandwidth  for  a  complex-valued  signal  is  dehned 
as  the  size  of  the  frequency  band  it  occupies  over  both  positive  and  negative  frequencies.  The 
justihcation  for  this  convention  becomes  apparent  later  in  this  chapter. 

Example  2.6.1  Some  bandwidth  computations 

(a)  Consider  u{t)  =  sinc(2f),  where  the  unit  of  time  is  microseconds.  Then  the  unit  of  frequency 
is  MHz,  and  U{f)  =  i/[_i_i](/)  is  strictly  bandlimited  with  2  MHz. 

(b)  Now,  consider  the  timelimited  waveform  u{t)  =  /[2,4](t),  where  the  unit  of  time  is  microsec¬ 
onds.  Then  U{f)  =  2smc{2f)e~^^'^fi  which  is  not  bandlimited.  The  99%  energy  containment 
bandwidth  W  is  dehned  by  the  equation 

/W  poo  poo  pA 

\U{f)\‘^df  =  0.99  \U{f)\‘^df  =  0.99  \u{t)\‘^dt  =  0.99  =  1.98 

-W  J —oo  J —oo  J2 

where  we  use  Parseval’s  identity  to  simplify  computation  for  timelimited  waveforms.  Using  the 
fact  that  \U{f)\  is  even,  we  obtain  that 

rW  rW 

1.98  =  2  /  \U{f)\^df  =  2  /  4sinc2(2/)d/ 

Jo  Jo 

We  can  now  solve  numerically  to  obtain  fU  ~  5.1  MHz. 


2.7  Baseband  and  Passband  Signals 

Baseband:  A  signal  u{t)  is  said  to  be  baseband  if  the  signal  energy  is  concentrated  in  a  band 
around  DC,  and 

u{f)  ~  0,  I/I  >  W  (2.66) 

for  some  lU  >  0.  Similarly,  a  channel  modeled  as  a  linear  time  invariant  system  is  said  to  be 
baseband  if  its  transfer  function  Id(f)  has  support  concentrated  around  DC,  and  satishes  (2.66). 
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Figure  2.25:  Example  of  the  spectrum  U{f)  for  a  real- valued  baseband  signal.  The  bandwidth 
of  the  signal  is  W. 
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Figure  2.26:  Example  of  the  spectrum  U{f)  for  a  real-valued  passband  signal.  The  bandwidth 
of  the  signal  is  W.  The  hgure  shows  an  arbitrarily  chosen  frequency  fc  within  the  band  in  which 
U{f)  is  nonzero.  Typically,  fc  is  much  larger  than  the  signal  bandwidth  W. 
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A  signal  u{t)  is  said  to  be  passband  if  its  energy  is  concentrated  in  a  band  away  from  DC,  with 

t/(/)^0,  \f±f,\>W  (2.67) 

where  fc  >  W  >  0.  A  channel  modeled  as  a  linear  time  invariant  system  is  said  to  be  passband 
if  its  transfer  function  H{f)  satishes  (2.67). 

Examples  of  baseband  and  passband  signals  are  shown  in  Figures  2.25  and  2.26,  respectively. 
Physically  realizable  signals  must  be  real-valued  in  the  time  domain,  which  means  that  their 
Fourier  transforms,  which  can  be  complex- valued,  must  be  conjugate  symmetric:  U{—f)  = 
U*{f).  As  discussed  earlier,  the  bandwidth  B  for  a  real- valued  signal  u{t)  is  the  size  of  the 
frequency  interval  (counting  only  positive  frequencies)  occupied  by  U{f). 

Information  sources  typically  emit  baseband  signals.  For  example,  an  analog  audio  signal  has 
signihcant  frequency  content  ranging  from  DC  to  around  20  KHz.  A  digital  signal  in  which  zeros 
and  ones  are  represented  by  pulses  is  also  a  baseband  signal,  with  the  frequency  content  governed 
by  the  shape  of  the  pulse  (as  we  shall  see  in  more  detail  in  Chapter  4).  Even  when  the  pulse  is 
timelimited,  and  hence  not  strictly  bandlimited,  most  of  the  energy  is  concentrated  in  a  band 
around  DC. 

Wired  channels  (e.g.,  telephone  lines,  USB  connectors)  are  typically  modeled  as  baseband:  the 
attenuation  over  the  wire  increases  with  frequency,  so  that  it  makes  sense  to  design  the  transmit¬ 
ted  signal  to  utilize  a  frequency  band  around  DC.  An  example  of  passband  communication  over 
a  wire  is  Digital  Subscriber  Line  (DSL),  where  high  speed  data  transmission  using  frequencies 
above  25  KHz  co-exists  with  voice  transmission  in  the  band  from  0-4  KHz.  The  design  and  use 
of  passband  signals  for  communication  is  particularly  important  for  wireless  communication,  in 
which  the  transmitted  signals  must  ht  within  frequency  bands  dictated  by  regulatory  agencies, 
such  as  the  Federal  Communication  Commission  (FCC)  in  the  United  States.  For  example,  an 
amplitude  modulation  (AM)  radio  signal  typically  occupies  a  frequency  interval  of  length  10  KHz 
somewhere  in  the  540-1600  KHz  band  allocated  for  AM  radio.  Thus,  the  baseband  audio  mes¬ 
sage  signal  must  be  transformed  into  a  passband  signal  before  it  can  be  sent  over  the  passband 
channel  spanning  the  desired  band.  As  another  example,  a  transmitted  signal  in  a  WiFi  network 
may  be  designed  to  ht  within  a  20  MHz  frequency  interval  in  the  2.4  GHz  unlicensed  band,  so 
that  digital  messages  to  be  sent  over  WiFi  must  be  encoded  onto  passband  signals  occupying  the 
designated  spectral  band. 


2.8  The  Structure  of  a  Passband  Signal 

In  order  to  employ  a  passband  channel  for  communication,  we  need  to  understand  how  to  design 
a  passband  transmitted  signal  to  carry  information,  and  how  to  recover  this  information  from  a 
passband  received  signal.  We  also  need  to  understand  how  the  transmitted  signal  is  affected  by 
a  passband  channel. 


2.8.1  Time  Domain  Relationships 

Let  us  start  by  considering  a  real-valued  baseband  message  signal  m(t)  of  bandwidth  W,  to  be 
sent  over  a  passband  channel  centered  around  fc-  As  illustrated  in  Figure  2.27,  we  can  translate 
the  message  to  passband  simply  by  multiplying  it  by  a  sinusoid  at  fc- 

Up{t)  =  m{t)  cos27r/et  O  Up{f)  =  ^  {M{f  -  fc)  +  M{f  +  fc)) 

We  use  the  term  carrier  frequency  for  fc,  and  the  term  carrier  for  a  sinusoid  at  the  carrier 
frequency,  since  the  modulated  sinusoid  is  “carrying”  the  message  information  over  a  passband 
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Figure  2.27:  A  baseband  message  of  bandwidth  W  is  translated  to  passband  by  multiplying  by 
a  sinusoid  at  frequency  fc,  as  long  as  fc  >  W. 


channel.  Instead  of  a  cosine,  we  could  also  use  a  sine: 

Vp{t)  =  m{t)  sin27r/cf  ^  Vp{f)  =  ^  {M{f  -  /J  -  M(/  +  /J) 

Note  that  \Up{f)\  and  \Vp{f)\  have  frequency  content  in  a  band  around  fc-,  and  are  passband 
signals  (i.e.,  living  in  a  band  not  containing  DC)  as  long  as  fc  >  W. 

I  and  Q  components:  If  we  use  both  the  cosine  and  sine  carriers,  we  can  construct  a  passband 
signal  of  the  form 

Up(t)  =  Uc(t)  COS  271  fct  —  Us(t)  sin  27ifct  (2.68) 

where  Uc  and  Ug  are  real  baseband  signals  of  bandwidth  at  most  W,  with  fc  >  W.  The  signal 
Uc(t)  is  called  the  in-phase  (or  I)  component,  and  Ugit)  is  called  the  quadrature  (or  Q)  component. 
The  negative  sign  for  the  Q  term  is  a  standard  convention.  Since  the  sinusoidal  terms  are  entirely 
predictable  once  we  specify  fc,  all  information  in  the  passband  signal  Up  must  be  contained  in 
the  I  and  Q  components.  Modulation  for  a  passband  channel  therefore  corresponds  to  choosing 
a  method  of  encoding  information  into  the  I  and  Q  components  of  the  transmitted  signal,  while 
demodulation  corresponds  to  extracting  this  information  from  the  received  passband  signal.  In 
order  to  accomplish  modulation  and  demodulation,  we  must  be  able  to  upconvert  from  baseband 
to  passband,  and  downconvert  from  passband  to  baseband,  as  follows. 


cos  27lf(.t 
-sin  231 IJ.  t 


uXt) 


■Un(t) 


Up(t) 


Lowpass 

Filter 


-w(t) 


2cos  23if(,t 
-2sin  231  ^  t 


Lowpass 

Filter 


Us(t) 


Upconversion 
(baseband  to  passband) 


Downconversion 
(passband  to  baseband) 


Figure  2.28:  Upconversion  from  baseband  to  passband,  and  downconversion  from  passband  to 
baseband. 


Upconversion  and  downconversion:  Equation  (2.68)  immediately  tells  us  how  to  upconvert 
from  baseband  to  passband.  To  downconvert  from  passband  to  baseband,  consider 

2up{t)  cos(27r/ct)  =  2uc{t)  cos^  27t fct  —  2us{t)  sin  27r/cf  cos  27r/ct 
=  Ucit)  +  Uc{t)  cos  4:71  fct  —  Usit)  sin  471  fct 
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The  first  term  on  the  extreme  right-hand  side  is  the  I  component  Uc{t),  a  baseband  signal.  The 
second  and  third  terms  are  passband  signals  at  2/c,  which  we  can  get  rid  of  by  lowpass  hltering. 
Similarly,  we  can  obtain  the  Q  component  Us{t)  by  lowpass  hltering  —2up{t)sin27rfct.  Block 
diagrams  for  upconversion  and  downconversion  are  depicted  in  Figure  2.28.  Implementation  of 
these  operations  could,  in  practice,  be  done  in  multiple  stages,  and  requires  careful  analog  circuit 
design. 

We  now  dig  deeper  into  the  structure  of  a  passband  signal.  First,  can  we  choose  the  I  and  Q 
components  freely,  independent  of  each  other?  The  answer  is  yes:  the  I  and  Q  components 
provide  two  parallel,  orthogonal  “channels”  for  encoding  information,  as  we  show  next. 

Orthogonality  of  I  and  Q  channels:  The  passband  waveform  ap{t)  =  Uc{t)  cos  271  fct  corre¬ 
sponding  to  the  I  component,  and  the  passband  waveform  bp{t)  =  Ugit)  sm27Tfct  corresponding 
to  the  Q  component,  are  orthogonal.  That  is. 


(op, 0 


(2.69) 


Let 


x{t)  =  ap{t)bp{t)  =  Uc{t)us{t)  cos  27r/ct  sin  27r/ct  =  -Uc{t)us{t)  sin  dvr /ct 


We  prove  the  desired  result  by  showing  that  x{t)  is  a  passband  signal  at  2/c,  so  that  its  DC 
component  is  zero.  That  is, 

/OO 

x{t)dt  =  X(0)  =  0 

•CX) 

which  is  the  desired  result.  To  show  this,  note  that 

pit)  =  ^Ucit)u,{t)  ^  *  Us)if) 


is  a  baseband  signal:  if  Udf)  is  baseband  with  bandwidth  Wi  and  Us{f)  is  baseband  with 
bandwidth  W2,  then  their  convolution  has  bandwidth  at  most  Wi  -|-  IV2.  In  order  for  Up  to  be 
passband,  we  must  have  /c  >  Wi,  and  in  order  for  bp  to  be  passband,  we  must  have  fc  >  W2. 
Thus,  2/c  >  Wi  +  W2,  which  means  that  x(t)  =  p(t)  sindvr/ct  is  passband  around  2/c,  and  is 
therefore  zero  at  DC.  This  completes  the  derivation. 


Example  2.8.1  (Passband  signal):  The  signal 

Up{t)  =  /[o,i](t)  cosSOOvrt  —  (1  —  sinSOOvrt 

is  a  passband  signal  with  I  component  udt)  =  I[o,i]it)  and  Q  component  Ugit)  =  (1  — 

This  example  illustrates  that  we  do  not  require  strict  bandwidth  limitations  in  our  definitions 
of  passband  and  baseband:  the  I  and  Q  components  are  timelimited,  and  hence  cannot  be 
bandlimited.  However,  they  are  termed  baseband  signals  because  most  of  their  energy  lies  in 
baseband.  Similarly,  Up(t)  is  termed  a  passband  signal,  since  most  of  its  frequency  content  lies 
in  a  small  band  around  150  Hz. 


Envelope  and  phase:  Since  a  passband  signal  Up  is  equivalent  to  a  pair  of  real- valued  baseband 
waveforms  {uc,Us),  passband  modulation  is  often  called  two-dimensional  modulation.  The  repre¬ 
sentation  (2.68)  in  terms  of  I  and  Q  components  corresponds  to  thinking  of  this  two-dimensional 
waveform  in  rectangular  coordinates  (the  “cosine  axis”  and  the  “sine  axis”).  We  can  also  rep¬ 
resent  the  passband  waveform  using  polar  coordinates.  Consider  the  rectangular-polar  transfor¬ 
mation 


e{t)  =  \/ul{t)  +ul{t)  , 


e{t)  =  tan  ^ 

Uc[t) 
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where  e(t)  >  0  is  termed  the  envelope  and  9(t)  is  the  phase.  This  corresponds  to  udt)  = 
e(t)  cos  9(t)  and  Us(t)  =  e(t)  sm9(t).  Substituting  in  (2.68),  we  obtain 

Up{t)  =  e{t)  cos  9{t)  cos  27r/ct  —  e{t)  sin  9{t)  sin  27r/ct  =  e{t)  cos  (27r/ct  +  9{t))  (2.70) 

This  provides  an  alternate  representation  of  the  passband  signal  in  terms  of  baseband  envelope 
and  phase  signals. 


Figure  2.29:  Geometry  of  the  complex  envelope. 

Complex  envelope:  To  obtain  a  third  representation  of  a  passband  signal,  we  note  that  a 
two-dimensional  point  can  also  be  mapped  to  a  complex  number;  see  Section  2.1.  We  define  the 
complex  envelope  u{t)  of  the  passband  signal  Up{t)  in  (2.68)  and  (2.70)  as  follows: 

u{t)  =  Uc{t)  +  jus{t)  =  e{t)e^^^*^  (2.71) 

We  can  now  express  the  passband  signal  in  terms  of  its  complex  envelope.  From  (2.70),  we  see 
that 

Up{t)  =  e(t)Re  =  Re  =  Re 

This  leads  to  our  third  representation  of  a  passband  signal: 

Upit)  =  Re  {u{t)e^‘^^^^^)  (2.72) 

While  we  have  obtained  (2.72)  using  the  polar  representation  (2.70),  we  should  also  check  that 
it  is  consistent  with  the  rectangular  representation  (2.68),  writing  out  the  real  and  imaginary 
parts  of  the  complex  waveforms  above  as  follows: 

=  {uc{t)  +jus{t))  {cos  271  fct  +  j  sin  271  fct)  ,  . 

=  {Uc{t)  cos  277 f ft  —  Us{t)  sin  277 f^t)  +  j  {Us{t)  cos  277 f ft  +  Uc{t)  sin  277ff.t) 

Taking  the  real  part,  we  obtain  the  expression  (2.68)  for  Up{t). 

The  relationship  between  the  three  time  domain  representations  of  a  passband  signal  in  terms 
of  its  complex  envelope  is  depicted  in  Figure  2.29.  We  now  specify  the  corresponding  frequency 
domain  relationship. 

Information  resides  in  complex  baseband:  The  complex  baseband  representation  corre¬ 
sponds  to  subtracting  out  the  rapid,  but  predictable,  phase  variation  due  to  the  fixed  reference 
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frequency  /c,  and  then  considering  the  much  slower  amplitude  and  phase  variations  induced  by 
baseband  modulation.  Since  the  phase  variation  due  to  fc  is  predictable,  it  cannot  convey  any 
information.  Thus,  all  the  information  in  a  passband  signal  is  contained  in  its  complex  envelope. 

Choice  of  frequency /phase  reference  is  arbitrary:  We  can  dehne  the  complex  baseband 
representation  of  a  passband  signal  using  an  arbitrary  frequency  reference  fc  (and  can  also  vary 
the  phase  reference),  as  long  as  we  satisfy  fc  >  W,  where  W  is  the  bandwidth.  We  may  often  wish 
to  transform  the  complex  baseband  representations  for  two  different  references.  For  example,  we 
can  write 

Up{t)  =  Uci{t)  cos{2TTfit+6i)-Usi{t)sm{2Trfit+6i)  =  Uc2{t)  cos(27r/2t  +  6*2) -■Us2(t)  sin(27r/2t+6*2) 

We  can  express  this  more  compactly  in  terms  of  the  complex  envelopes  Ui  =  Ud  +  jugi  and 
U2  —  Uc2 

Up{t)  =  Re  =  Re  (2.74) 

We  can  now  hnd  the  relationship  between  these  complex  envelopes  by  transforming  the  expo¬ 
nential  term  for  one  reference  to  the  other: 

Up{t)  =  Re  =  Re  ([Mi(t)e^(2-(h-72)*+^i-^'2)]eh2-/2t+e2)^  ^2.75) 

Comparing  with  the  extreme  right-hand  sides  of  (2.74)  and  (2.75),  we  can  read  off  that 

U2{t)  =  ni(t)e^(2-(h-/2)t+0i-e2) 

While  we  derived  this  result  using  algebraic  manipulations,  it  has  the  following  intuitive  interpre¬ 
tation:  if  the  instantaneous  phase  2nfit  -|-  9i  of  the  reference  is  ahead/behind,  then  the  complex 
envelope  must  be  correspondingly  retarded/advanced,  so  that  the  instantaneous  phase  of  the 
overall  passband  signal  stays  the  same.  We  illustrate  this  via  some  examples  below. 

Example  2.8.2  (Change  of  reference  frequency /phase)  Consider  the  passband  signal  np(t) 
/[_i4](t)  cos  dOOvrt. 

(a)  Find  the  output  when  Up{t)  cosdOlvrt  is  passed  through  a  lowpass  hlter. 

(b)  Find  the  output  when  Up{t)  sin(4007rt  —  |)  is  passed  through  a  lowpass  hlter. 

Solution:  From  Figure  2.28,  we  recognize  that  both  (a)  and  (b)  correspond  to  downconversion 
operations  with  different  frequency  and  phase  references.  Thus,  by  converting  the  complex  en¬ 
velope  with  respect  to  the  appropriate  reference,  we  can  read  off  the  answers. 

(a)  Letting  ui  =  Ud  +  jusi  denote  the  complex  envelope  with  respect  to  the  reference  we 

recognize  that  the  output  of  the  LPF  is  Ud/2.  The  passband  signal  can  be  written  as 

Up{t)  =  J[_i^i](t)  cosdOOvrt  =  Re  (/[_i^i](t)e-’'^°°’^*) 

We  can  now  massage  it  to  read  off  the  complex  envelope  for  the  new  reference: 

Up{t)  =  Re  (J[_i,i](f)e-^"'e^'"°i"*) 

from  which  we  see  that  Ui{t)  =  /[_i_i](t)e“-^""*  =  /[_i_i](t)  (cos  vrt  —  j  sin  vri).  Taking  real  and 
imaginary  parts,  we  obtain  Udit)  =  J[_i^i](t)  cosvrt  and  Usi{t)  =  — /[_i_i](t)  sinTrt,  respectively. 
Thus,  the  LPF  output  is  i/[_i4](t)  cosvrt. 

(b)  Letting  U2  =  Uc2  +  jUs2  denote  the  complex  envelope  with  respect  to  the  reference 

we  recognize  that  the  output  of  the  LPF  is  — ^52/2.  We  can  convert  to  the  new  reference  as 
before: 

Up{t)  =  Re 

which  gives  the  complex  envelope  U2  =  /[_i,i](t)e-^^  =  /[_i^i](t)  (cos  ^  -|- jsin|).  Taking  real  and 
imaginary  parts,  we  obtain  Uc2{t)  =  /[_i^i](t)  cos  ^  and  Ua2{t)  =  sin  respectively.  Thus, 

the  LPF  output  is  given  by  -Us2/2  =  -i/[_i,i](t)  sin  f  = 
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From  a  practical  point  of  view,  keeping  track  of  frequency /phase  references  becomes  important 
for  the  task  of  synchronization.  For  example,  the  carrier  frequency  used  by  the  transmitter  for 
upconversion  may  not  be  exactly  equal  to  that  used  by  the  receiver  for  downconversion.  Thus, 
the  receiver  must  compensate  for  the  phase  rotation  incurred  by  the  complex  envelope  at  the 
output  of  the  downconverter,  as  illustrated  by  the  following  example. 

Example  2.8.3  (Modeling  and  compensating  for  frequency /phase  offsets  in  complex 
baseband):  Consider  the  passband  signal  Up  (2.68),  with  complex  baseband  representation 
u  =  Uc  +  jUg.  Now,  consider  a  phase-shifted  version  of  the  passband  signal 

Up{t)  =  Ucit)  cos(27r/cf  -|-  0{t))  —  Us{t)  sin(27r/ct  -|-  9{t)) 

where  6(t)  may  vary  slowly  with  time.  For  example,  a  carrier  frequency  offset  A/  and  a  phase 
offset  7  corresponds  to  9{t)  =  27iAft  +  7.  Suppose,  now,  that  the  signal  is  downconverted  as 
in  Figure  2.28,  where  we  take  the  phase  reference  as  that  of  the  receiver’s  local  oscillator  (LO). 
How  do  the  I  and  Q  components  depend  on  the  phase  offset  of  the  received  signal  relative  to  the 
LO?  The  easiest  way  to  answer  this  is  to  find  the  complex  envelope  of  Up  with  respect  to  fc-  To 
do  this,  we  write  Up  in  the  standard  form  (2.70)  as  follows: 


Up{t)  =  Re 


Comparing  with  the  desired  form 


Up{t)  =  Re(M(f)e-^^’’'-^'^*) 


we  can  read  off 

u{t)  =  M(f)e^'®W  (2.76) 

Equation  (2.76)  relates  the  complex  envelopes  before  and  after  a  phase  offset.  We  can  expand 
out  this  “polar  form”  representation  to  obtain  the  corresponding  relationship  between  the  I  and 
Q  components.  Suppressing  time  dependence  from  the  notation,  we  can  rewrite  (2.76)  as 

Uc  +  jUg  =  {uc  +  jUg)  (cos  9  +  j  sin  9) 

using  Euler’s  formula.  Equating  real  and  imaginary  parts  on  both  sides,  we  obtain 


Uc  =  Uc  cos  9  —  Ug  sin  9 
Ug  =  Uc  sin  9  +  Ug  cos  9 


(2.77) 


The  phase  offset  therefore  results  in  the  I  and  Q  components  being  mixed  together  at  the  output 
of  the  downconverter.  Thus,  for  a  coherent  receiver  recovers  the  original  I  and  Q  components  Uc, 
Ug,  we  must  account  for  the  (possibly  time  varying)  phase  offset  9(t).  In  particular,  if  we  have 
an  estimate  of  the  phase  offset,  then  we  can  undo  it  by  inverting  the  relationship  in  (2.76): 


u{t)  =  u(t)e 

which  can  be  written  out  in  terms  of  real-valued  operations  as  follows: 


Uc 

Ug 


Uc  COS  9  +  Ug  sin  9 
■  —Uc  sin  9  +  Ug  cos  9 


(2.78) 


(2.79) 


The  preceding  computations  provide  a  typical  example  of  the  advantage  of  working  in  complex 
baseband.  Relationships  between  passband  signals  can  be  compactly  represented  in  complex 
baseband,  as  in  (2.76)  and  (2.78).  For  signal  processing  using  real- valued  arithmetic,  these 
complex  baseband  relationships  can  be  expanded  out  to  obtain  relationships  involving  real-valued 
quantities,  as  in  (2.77)  and  (2.79).  See  Software  Lab  2.1  for  an  example  of  such  computations. 
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Re(Up(f)) 


Re(C(f)) 


fc 


Im(C(f)) 


2B- 

f 

fc 

Im(U(f)) 


2B 

Figure  2.30:  Frequency  domain  relationship  between  a  real-valued  passband  signal  and  its  com¬ 
plex  envelope.  The  hgure  shows  the  spectrum  Up{f)  of  the  passband  signal,  its  scaled  restriction 
to  positive  frequencies  C(/),  and  the  spectrum  U{f)  of  the  complex  envelope. 
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2.8.2  Frequency  Domain  Relationships 

Consider  an  arbitrary  complex-valued  baseband  waveform  u{t)  whose  frequency  content  is  con¬ 
tained  in  [— hb,  hb],  and  suppose  that  /c  >  W .  We  want  to  show  that 

Up(t)  =  Re  =  Re  (c(t))  (2.80) 

is  a  real-valued  passband  signal  whose  frequency  is  concentrated  around  ±fc,  away  from  DC.  Let 

c{t)  =  ^  Cif)  =  U{f  -  Q  (2.81) 

That  is,  C(/)  is  the  complex  envelope  f/(/),  shifted  to  the  right  by  fc-  Since  U{f)  has  frequency 
content  in  [— hb,  W],  C{f)  has  frequency  content  around  [fc  —  hb,  /c  +  hb].  Since  fc  —  W  >0,  this 
band  does  not  include  DC.  Now, 

Up{t)  =  Re  (c(f))  =  ^  (c(f)  +  c*{t))  ^  Up{f)  =  ^  {C{f)  +  C*{-f)) 

Since  C*{—f)  is  the  complex  conjugated  version  of  C{f),  flipped  around  the  origin,  it  has  fre¬ 
quency  content  in  the  band  of  negative  frequencies  [—fc  —  W,  —  fc  +  W]  around  —fc,  which  does 
not  include  DC  because  —fc  -|-  hb  <  0.  Thus,  we  have  shown  that  Up{t)  is  a  passband  signal.  It 
is  real- valued  by  virtue  of  its  construction  using  the  time  domain  equation  (2.80),  which  involves 
taking  the  real  part.  But  we  can  also  doublecheck  for  consistency  in  the  frequency  domain: 
Up{f)  is  conjugate  symmetric,  since  its  positive  frequency  component  is  C(/),  and  its  nega¬ 
tive  frequency  component  is  C*{—f).  Substituting  C{f)  by  f/(/  —  fc),  we  obtain  the  passband 
spectrum  in  terms  of  the  complex  baseband  spectrum: 

UM)  =  I  (U{f  -  fc)  +  U'i-f  -  /,))  (2.82) 

So  far,  we  have  seen  how  to  construct  a  real-valued  passband  signal  given  a  complex-valued 
baseband  signal.  To  go  in  reverse,  we  must  answer  the  following:  do  the  equivalent  representa¬ 
tions  (2.68),  (2.70),  (2.72)  and  (2.82)  hold  for  any  passband  signal,  and  if  so,  how  do  we  hnd 
the  spectrum  of  the  complex  envelope  given  the  spectrum  of  the  passband  signal?  To  answer 
these  questions,  we  simply  trace  back  the  steps  we  used  to  arrive  at  (2.82).  Given  the  spec¬ 
trum  Up{f)  for  a  real-valued  passband  signal  Up{t),  we  construct  C{f)  as  a  scaled  version  of 
Up{f)  =  Lp(/)/[o,oo)(/))  fhe  positive  frequency  part  of  Up{f),  as  follows: 

C(/)  =  2U*(f)  =  I  ° 

This  means  that  Up{f)  =  |G(/)  for  positive  frequencies.  By  the  conjugate  symmetry  of  Up{f), 
the  negative  frequency  component  must  be  |G*(— /),  so  that  Up{f)  =  |G(/)  -|-  |G*(— /).  In  the 
time  domain,  this  corresponds  to 

Upi't)  =  \c{t)  +  =  Re  (c(f))  (2.83) 

Now,  let  us  dehne  the  complex  envelope  as  follows: 

u{t)  =  O  U{f)  =  C{f  +  fc) 

Since  c(t)  =  ,  we  obtain  the  desired  relationship  (2.68)  on  substituting  into  (2.83). 

Since  C{f)  has  frequency  content  in  a  band  around  fc,  U{f),  which  is  obtained  by  shifting  C{f) 
to  the  left  by  fc,  is  indeed  a  baseband  signal  with  frequency  content  in  a  band  around  DC. 
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Frequency  domain  expressions  for  I  and  Q  components:  If  we  are  given  the  time  domain 
complex  envelope,  we  can  read  off  the  I  and  Q  components  as  the  real  and  imaginary  parts: 


Uc(t)  =  Re{u(t))  =  I  {u(t)  +  u*(t)) 
Us{t)  =  lm{u{t))  =  ^  {u{t)  -  u*{t)) 

Taking  Fourier  transforms,  we  obtain 


Figure  2.30  shows  the  relation  between  the  passband  signal  Up{f),  its  scaled  version  C{f)  re¬ 
stricted  to  positive  frequencies,  and  the  complex  baseband  signal  U{f).  As  this  example  em¬ 
phasizes,  all  of  these  spectra  can,  in  general,  be  complex-valued.  Equation  (2.80)  corresponds  to 
starting  with  an  arbitrary  baseband  signal  U{f)  as  in  the  bottom  of  the  hgure,  and  constructing 
C(/)  as  depicted  in  the  middle  of  the  hgure.  We  then  use  C{f)  to  construct  a  conjugate  sym¬ 
metric  passband  signal  Up{f),  proceeding  from  the  middle  of  the  hgure  to  the  top.  This  example 
also  shows  that  U{f)  does  not,  in  general,  obey  conjugate  symmetry,  so  that  the  baseband  signal 
u(t)  is,  in  general,  complex-valued.  However,  by  construction,  Up{f)  is  conjugate  symmetric,  and 
hence  the  passband  signal  Up(t)  is  real- valued. 


Example  2.8.4  Let  Vp{t)  denote  a  real-valued  passband  signal,  with  Fourier  transform  Vp{f) 
specihed  as  follows  for  negative  frequencies: 


V(f)  =  S  -(/  +  99)  -101</<-99 

’  \0  /  < -101  or  -99  <  /  <  0 

(a)  Sketch  Vp{f)  for  both  positive  and  negative  frequencies. 

(b)  Without  explicitly  taking  the  inverse  Fourier  transform,  can  you  say  whether  Vp{t)  =  Vp{—t) 
or  not? 

(c)  Find  and  sketch  I4(/)  and  W(/),  the  Fourier  transforms  of  the  I  and  Q  components  with 
respect  to  a  reference  frequency  fc  =  99.  Do  this  without  going  to  the  time  domain. 

(d)  Find  an  explicit  time  domain  expression  for  the  output  when  Vp(t)  cos2007rt  is  passed  through 
an  ideal  lowpass  hlter  of  bandwidth  4. 

(e)  Find  an  explicit  time  domain  expression  for  the  output  when  Vp(t)  sin  2027rt  is  passed  through 
an  ideal  lowpass  hlter  of  bandwidth  4. 

Solution: 


Figure  2.31:  Sketch  of  passband  spectrum  for  Example  2.8.4. 


(a)  Since  Vp(t)  is  real-valued,  we  have  Vp  (/)  =  V*  (— /).  Since  the  spectrum  is  also  given  to  be 
real-valued  for  /  <  0,  we  have  V*  (— /)  =  Vp  (— /).  The  spectrum  is  sketched  in  Figure  2.31. 

(b)  Yes,  Vp{t)  =  Vp{—t).  Since  Vp{t)  is  real-valued,  we  have  Vp{—t)  =  Vp{—t)  0  Vp{f).  But 
Vpif)  =  Vp{f),  since  the  spectrum  is  real-valued. 


71 


Vc(f) 


Figure  2.32:  Sketch  of  I  and  Q  spectra  in  Example  2.8.4(c),  taking  reference  frequency  fc  =  99. 


(c)  The  spectrum  of  the  complex  envelope  and  the  I  and  Q  components  are  shown  in  Figure  2.32. 
The  complex  envelope  is  obtained  as  V{f)  =  21/+(/  +  /c),  while  the  I  and  Q  components  satisfy 

In  our  case,  Vc{f)  =  |/|/[-2,2](/)  and  jVs{f)  =  //[-2,2](/)  are  real-valued,  and  are  plotted  in  the 
hgure. 


Vc(f) 


-1 

1 

Figure  2.33:  Finding  the  I  component  in  Example  2.8.4(d),  taking  reference  frequency  as  fc  = 

100. 

(d)  The  output  of  the  LPF  is  Vc(t)/2,  where  Vc  is  the  I  component  with  respect  to  fc  =  100.  In 
Figure  2.33,  we  construct  the  complex  envelope  and  the  I  component  as  in  (c),  except  that  the 
reference  frequency  is  different.  Clearly,  the  boxcar  spectrum  corresponds  to  Vc{t)  =  4sinc(2f), 
so  that  the  output  is  2sinc(2f). 
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Figure  2.34:  Finding  the  Q  component  in  Example  2.8.4(e),  taking  reference  frequency  as  fc  = 
101. 


(e)  The  output  of  the  LPF  is  —Vs{t)/2,  where  Vg  is  the  I  component  with  respect  to  fc  =  101.  In 
Figure  2.34,  we  construct  the  complex  envelope  and  the  Q  component  as  in  (c),  except  that  the 
reference  frequency  is  different.  We  now  have  to  take  the  inverse  Fourier  transform,  which  is  a 
little  painful  if  we  do  it  from  scratch.  Instead,  let  us  differentiate  to  see  that 

=  4[-2,2](/)  -  4(5(/)  O  4sinc(4t)  -  4 

But  -H-  —j2TrtVs{t),  so  that  j -H-  2TTtVs{t).  We  therefore  obtain  that  2TitVs{t)  = 

4sinc(4t)  —  4,  or  Vgit)  =  _  Thus,  the  output  of  the  LPF  is  —Vs{t)/2,  or 


Passband 


Complex  Baseband 


Output 


Figure  2.35:  The  relationship  between  passband  hltering  and  its  complex  baseband  analogue. 
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Real  baseband  operations 


Figure  2.36:  Complex  baseband  realization  of  passband  filter.  The  constant  scale  factors  of  | 
have  been  omitted. 

2.8.3  Complex  baseband  equivalent  of  passband  filtering 

We  now  state  another  result  that  is  extremely  relevant  to  transceiver  operations;  namely,  any 
passband  hlter  can  be  implemented  in  complex  baseband.  This  result  applies  to  hltering  op¬ 
erations  that  we  desire  to  perform  at  the  transmitter  (e.g.,  to  conform  to  spectral  masks),  at 
the  receiver  (e.g.,  to  hlter  out  noise),  and  to  a  broad  class  of  channels  modeled  as  linear  bi¬ 
ters.  Suppose  that  a  passband  signal  Up(t)  =  udt)  cos27r/cf  —  Ugit)  sin27r/ct  is  passed  through 
a  passband  hlter  with  impulse  response  hp{t)  =  hdt)  cos27r/cf  —  hs{t)  sin27r/ct  to  get  an  output 
2/p(^)  =  i.'^p  *  III  frequency  domain,  Yp{f)  =  Hp{f)Up{f),  so  that  the  output  yp{t)  is 

also  passband,  and  can  be  written  as  Vpit)  =  Hcit)  cos27ifct  —  ys{t)  sin27r/cf.  How  are  the  I  and  Q 
components  of  the  output  related  to  those  of  the  input  and  the  hlter  impulse  response?  We  now 
show  that  a  compact  answer  is  given  in  terms  of  complex  envelopes:  the  complex  envelope  y  is  the 
convolution  of  the  complex  envelopes  of  the  input  and  the  impulse  response,  up  to  a  scale  factor. 
Let  y,  u  and  h  denote  the  complex  envelopes  for  yp,  Up  and  hp,  respectively,  with  respect  to  a 
common  frequency  reference  fc-  Since  real-valued  passband  signals  are  completely  characterized 
by  their  spectra  for  positive  frequencies,  the  passband  hltering  equation  Yp{f)  =  Up{f)Hp{f) 
can  be  separately  (and  redundantly)  written  out  for  positive  and  negative  frequencies,  because 
the  waveforms  are  conjugate  symmetric  around  the  origin,  and  there  is  no  energy  around  /  =  0. 
Thus,  focusing  on  the  positive  frequency  segments  H+(/)  =  Yp{f)Iyy.Qj,  =  f/p(/)/{/>o}, 

H+(/)  =  Hp(/)J{/>o},  we  have  Y~^{f)  =  from  which  we  conclude  that  the  complex 

envelope  of  y  is  given  by 

Y{f)  =  2Y+{f  +  /,)  =  2U+{f  +  m+U  +  Q  =  \u(f)H(f) 

Figure  2.35  depicts  the  relationship  between  the  passband  and  complex  baseband  waveforms 
in  the  frequency  domain,  and  supplies  a  pictorial  proof  of  the  preceding  relationship.  We  now 
restate  this  important  result  in  the  time  domain: 

y{t)  =  ^{u*h){t)  (2.84) 

A  practical  consequence  of  this  is  that  any  desired  passband  hltering  function  can  be  realized  in 
complex  baseband.  As  shown  in  Figure  2.36,  this  requires  four  real  baseband  filters:  writing  out 
the  real  and  imaginary  parts  of  (2.84),  we  obtain 

1  1 

yc=  ^{uc*hc-Us*hs),  ys  =  *  K  +  Uc  *  hs)  (2.85) 
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Figure  2.37:  Convolution  of  two  boxes  for  Example  2.8.5. 


Example  2.8.5  The  passband  signal  u(t)  =  coslOOvrf  is  passed  through  the  passband 

hlter  h{t)  =  /[o^3](f)  sin  lOOvri.  Find  an  explicit  time  domain  expression  for  the  hlter  output. 
Solution:  We  need  to  hnd  the  convolution  yp{t)  of  the  signal  Up{t)  =  cos  lOOvrt  with  the 

impulse  response  hp(t)  =  /[o,3](t)  sin  lOOvrt,  where  we  have  inserted  the  subscript  to  explicitly 
denote  that  the  signals  are  passband.  The  corresponding  relationship  in  complex  baseband  is 
y  =  (1/2)m  *  h.  Taking  a  reference  frequency  fc  =  50,  we  can  read  off  the  complex  envelopes 
u{t)  =  and  h{t)  =  -jJ[o,3](t),  so  that 


y  =  (-j72)7[-i,i](t)  *  Iio,3]{t) 

Let  s{t)  =  (l/2)/[_i^i](t)  *  I[o,3]{t)  denote  the  trapezoid  obtained  by  convolving  the  two  boxes,  as 
shown  in  Figure  2.37.  Then 

y{t)  =  -jsit) 

That  is,  i/c  =  0  and  ys  =  —s{t),  so  that  yp{t)  =  s{t)  sin  lOOvrt. 


2.8.4  General  Comments  on  Complex  Baseband 

Remark  2.8.1  (Complex  Baseband  in  Transceiver  Implementations)  Given  the  equiva¬ 
lence  of  passband  and  complex  baseband,  and  the  fact  that  key  operations  such  as  linear  hltering 
can  be  performed  in  complex  baseband,  it  is  understandable  why,  in  typical  modern  passband 
transceivers,  most  of  the  intelligence  is  moved  to  baseband  processing.  For  moderate  bandwidths 
at  which  analog-to-digital  and  digital-to-analog  conversion  can  be  accomplished  inexpensively, 
baseband  operations  can  be  efficiently  performed  in  DSP.  These  digital  algorithms  are  indepen¬ 
dent  of  the  passband  over  which  communication  eventually  occurs,  and  are  amenable  to  a  variety 
of  low-cost  implementations,  including  Very  Large  Scale  Integrated  Circuits  (VLSI),  Field  Pro¬ 
grammable  Gate  Arrays  (FPGA),  and  general  pnrpose  DSP  engines.  On  the  other  hand,  analog 
components  snch  as  local  oscillators,  power  amplihers  and  low  noise  amplihers  must  be  opti¬ 
mized  for  the  bands  of  interest,  and  are  often  bulky.  Thus,  the  trend  in  modern  transceivers  is 
to  accomplish  as  much  as  possible  using  baseband  DSP  algorithms.  For  example,  complicated 
Liters  shaping  the  transmitted  waveform  to  a  spectral  mask  dictated  by  the  FCC  can  be  achieved 
with  baseband  DSP  algorithms,  allowing  the  use  of  relatively  sloppy  analog  Liters  at  passband. 
Another  example  is  the  elimination  of  analog  phase  locked  loops  for  carrier  synchronization  in 
many  modern  receivers;  the  receiver  instead  employs  a  Lxed  analog  local  oscillator  for  downcon- 
version,  followed  by  a  digital  phase  locked  loop,  or  a  one-shot  carrier  frequency/phase  estimate, 
implemented  in  complex  baseband. 

Energy  and  power:  The  energy  of  a  passband  signal  equals  that  of  its  complex  envelope,  up 
to  a  scale  factor  which  depends  on  the  particnlar  convention  we  adopt.  In  particular,  for  the 
convention  in  (2.68),  we  have 


(2.86) 
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That  is,  the  energy  equals  the  sum  of  the  energies  of  the  I  and  Q  components,  up  to  a  scalar 
constant.  The  same  relationship  holds  for  the  powers  of  hnite-power  passband  signals  and  their 
complex  envelopes,  since  power  is  computed  as  a  time  average  of  energy.  To  show  (2.86),  consider 

\\up\\^  =  /  {uc{t)  cos27r/cf  —  Usit)  sin 27r/cf)^  df 

=  /  ul{t)  cos^  {271  fct)dt  +  f  Ug(t)  sin^(27r/cf)df  —  2  /  Uc(t)  cos27r/cf  Us(t)  sm2Tr  fctdt 

The  I-Q  cross  term  drops  out  due  to  I-Q  orthogonality,  so  that  we  are  left  with  the  I-I  and  Q-Q 
terms,  as  follows: 


ul{t)  cos^ {271  fct)dt  +  J  ul{t)  sin^ {271  fct)dt 

Now,  cos^  271  fct  =1  +  1  cos  4:71  fct  and  sin^  2,71  fj,  =  |  |  cos  dTr/^f.  We  therefore  obtain 

||Mp||^(=  ^  Jul{t)dt  +  ^  Ju‘l{t)dt  +  ^  J  ul{t)  cosAtt  fctdt  —  ^  J  u‘l{t)  cos  Att fctdt 

The  last  two  terms  are  zero,  since  they  are  equal  to  the  DC  components  of  passband  waveforms 
centered  around  2fc,  arguing  in  exactly  the  same  fashion  as  in  our  derivation  of  I-Q  orthogonality. 
This  gives  the  desired  result  (2.86). 

Correlation  between  two  signals:  The  correlation,  or  inner  product,  of  two  real-valued 
passband  signals  Up  and  Vp  is  defined  as 


{up,Vp)  =  /  Up{t)vp{t)dt 


Using  exactly  the  same  reasoning  as  above,  we  can  show  that 


{Upi'Up')  2  {{^ci'^cj  +  (^sWs)) 


(2.87) 


That  is,  we  can  implement  a  passband  correlation  by  hrst  downconverting,  and  then  employing 
baseband  operations:  correlating  I  against  I,  and  Q  against  Q,  and  then  summing  the  results.  It 
is  also  worth  noting  how  this  is  related  to  the  complex  baseband  inner  product,  which  is  dehned 
as 

iu,v)  =  Jf°^u{t)v*{t)dt  =  {uc{t)  +  ju,{t))  {vc{t)  -  jv,{t)) 

=  {{Uc,  Vc)  +  {Us,  Vs))  +  i  {{Us,  Vc)  -  {Uc,  Vs)) 

Comparing  with  (2.87),  we  obtain  that 


(2.88) 


{up,Vp)  =  ^Re((M,n)) 

That  is,  the  passband  inner  product  is  the  real  part  of  the  complex  baseband  inner  product  (up  to 
scale  factor).  Does  the  imaginary  part  of  the  complex  baseband  inner  product  have  any  meaning? 
Indeed  it  does:  it  becomes  important  when  there  is  phase  uncertainty  in  the  downconversion 
operation,  which  causes  the  I  and  Q  components  to  leak  into  each  other.  However,  we  postpone 
discussion  of  such  issues  to  later  chapters. 
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2.9  Wireless  Channel  Modeling  in  Complex  Baseband 


We  now  provide  a  glimpse  of  wireless  channel  modeling  using  complex  baseband.  There  are  two 
key  differences  between  wireless  and  wireline  communication.  The  first,  which  is  what  we  focus 
on  now,  is  multipath  propagation  due  to  reflections  off  of  scatterers  adding  up  at  the  receiver. 
This  addition  can  be  constructive  or  destructive  (as  we  saw  in  Example  2.5.4),  and  is  sensitive  to 
small  changes  in  the  relative  location  of  the  transmitter  and  receiver  which  produce  changes  in 
the  relative  delays  of  the  various  paths.  The  resulting  fluctuations  in  signal  strength  are  termed 
fading.  The  second  key  feature  of  wireless,  which  we  explore  in  a  different  wireless  module, 
is  interference:  wireless  is  a  broadcast  medium,  hence  the  receiver  can  also  hear  transmissions 
other  than  the  one  it  is  interested  in.  We  now  explore  the  effects  of  multipath  fading  for  some 
simple  scenarios.  While  we  just  made  up  the  example  impulse  response  in  Example  2.5.4,  we 
now  consider  more  detailed,  but  still  simplihed,  models  of  the  propagation  environment  and  the 
associated  channel  models. 

Consider  a  passband  transmitted  signal  at  carrier  frequency,  of  the  form 

Up{t)  =  Uc{t)  cos27i fct  —  Us{t)  sm27i fct  =  e{t)  cos(27r/ct  +  0{t)) 


where 

u{t)  =  Uc{t)  +  jus{t)  = 

is  the  complex  baseband  representation,  or  complex  envelope.  In  order  to  model  the  propagation 
of  this  signal  through  a  multipath  environment,  let  us  consider  its  propagation  through  a  path 
of  length  r.  The  propagation  attenuates  the  held  by  a  factor  of  1/r,  and  introduces  a  delay  of 
r(r)  =  where  c  denotes  the  speed  of  light.  Suppressing  the  dependence  of  r  on  r,  the  received 
signal  is  given  by 

Vpit)  =  —e{t  —  t)  cos(27r/c(t  —  t)  +  9{t  —  r)  +  0) 
r 

where  we  consider  relative  values  (across  paths)  for  the  constants  A  and  (j).  The  complex  envelope 
of  Vp(t)  with  respect  to  the  reference  is  given  by 

v(t)  =  -uit  -  (2.89) 

r 

For  example,  we  may  take  A  =  1,  0  =  0  for  a  direct,  or  line  of  sight  (LOS),  path  from  transmitter 
to  receiver,  which  we  may  take  as  a  reference.  Figure  2.38  shows  the  geometry  of  for  a  rehected 
path  corresponding  to  a  single  bounce,  relative  to  the  LOS  path.  Follow  standard  terminology,  6i 
denotes  the  angle  of  incidence,  and  9g  =  \—9i  the  grazing  angle.  The  change  in  relative  amplitude 
and  phase  due  to  the  reflection  depends  on  the  carrier  frequency,  the  reflector  material,  the  angle 
of  incidence,  and  the  polarization  with  respect  to  the  orientation  of  the  reflector  surface.  Since 
we  do  not  wish  to  get  into  the  underlying  electromagnetics,  we  consider  simplihed  models  of 
relative  amplitude  and  phase.  In  particular,  we  note  that  for  grazing  incidence  {9g  ~  0),  we  have 
A  ^  1,  (j)  ^  TT. 

Generalizing  (2.89)  to  multiple  paths  of  length  ri,r2,...,  the  complex  envelope  of  the  received 
signal  is  given  by 

v(t)  =  V  —u(t  -  (2.90) 

where  Ti  =  and  Ai,  (fi  depend  on  the  rehector  characteristic  and  incidence  angle  for  the  ith 
ray.  This  corresponds  to  the  complex  baseband  channel  impulse  response 

h{t)  =  -  Ti)  (2.91) 

^  i 
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Figure  2.38:  Ray  tracing  for  a  single  bounce  path.  We  can  reflect  the  transmitter  around  the 
reflector  to  create  a  virtual  source.  The  line  between  the  virtual  source  and  the  receiver  tells  us 
where  the  ray  will  hit  the  reflector,  following  the  law  of  reflection  that  the  angles  of  incidence 
and  reflection  must  be  equal.  The  length  of  the  line  equals  the  length  of  the  reflected  ray  to  be 
plugged  into  (2.92). 


This  is  in  exact  correspondence  with  our  original  multipath  model  (2.36),  with  ctj  =  t(27r/cTi+</)i)_ 

The  corresponding  frequency  domain  response  is  given  by 


i 


_ ig-i(27r/cri+0i)g-i27r/Ti 

n 


(2.92) 


Since  we  are  modeling  in  complex  baseband,  /  takes  values  around  DC,  with  /  =  0  corresponding 
to  the  passband  reference  frequency  fc- 

Channel  delay  spread  and  coherence  bandwidth:  We  have  already  introduced  these  con¬ 
cepts  in  Example  2.5.4,  but  reiterate  them  here.  Let  and  Tmax  denote  the  minimum  and 
maximum  of  the  delays  {rj}.  The  difference  Td  =  Tmax  —  Tmin  is  called  the  channel  delay  spread. 
The  reciprocal  of  the  delay  spread  is  termed  the  channel  coherence  bandwidth,  Be  =  ^.  A  base¬ 
band  signal  of  bandwidth  W  is  said  to  be  narrowband  if  Wrd  =  W/ Be  1,  or  equivalently,  if 
its  bandwidth  is  signihcantly  smaller  than  the  channel  coherence  bandwidth. 

We  can  now  infer  that,  for  a  narrowband  signal  around  the  reference  frequency,  the  received 
complex  baseband  signal  equals  a  delayed  version  of  the  transmitted  signal,  scaled  by  the  complex 
channel  gain 

h  =  HU))  =  V  (2.93) 

^  r,: 


Example  2.9.1  (Two  ray  model)  Suppose  our  propagation  environment  consists  of  the  LOS 
ray  and  the  single  reflected  ray  shown  in  Figure  2.38.  Then  we  have  two  rays,  with  ri  = 
B?  +  {hr  —  ht)'^  and  r2  =  B?  +  {hr  -|-  The  corresponding  delays  are  Ti  =  ri/c,  i  =  1,  2, 
where  c  denotes  the  speed  of  propagation.  The  grazing  angle  is  given  by  9g  =  tan“^  Setting 

Ai  =  1  and  0i  =  0,  once  we  specify  A2  and  <1)2  for  the  reflected  path,  we  can  specify  the  complex 
baseband  channel.  Numerical  examples  are  explored  in  Problem  2.21,  and  in  Software  Lab  2.2. 


78 


2.10  Concept  Inventory 


In  addition  to  a  review  of  basic  signals  and  systems  concepts  snch  as  convolntion  and  Fonrier 
transforms,  the  main  focus  of  this  chapter  is  to  develop  the  complex  baseband  representation  of 
passband  signals,  and  to  emphasize  its  crucial  role  in  modeling  and  implementation  of  commu¬ 
nication  systems. 

Review 

•  Euler’s  formula:  =  cos  6  +  j  sin  6 

•  Important  signals:  delta  function  (sifting  property),  indicator  function,  complex  exponential, 
sinusoid,  sine 

•  Signals  analogous  to  vectors:  Inner  product,  energy  and  norm 

•  LTI  systems:  impulse  response,  convolution,  complex  exponentials  as  eigenfunctions,  multipath 
channel  modeling 

•  Fourier  series:  complex  exponentials  or  sinusoids  as  basis  for  periodic  signals,  conjugate  sym¬ 
metry  for  real-valued  signals,  Parseval’s  identity,  use  of  differentiation  to  simplify  computation 

•  Fourier  transform:  standard  pairs  (sine  and  boxcar,  impulse  and  constant),  effect  of  time  de¬ 
lay  and  frequency  shift,  conjugate  symmetry  for  real-valued  signals,  Parseval’s  identity,  use  of 
differentiation  to  simplify  computation,  numerical  computation  using  DFT 

•  Bandwidth:  for  physical  signals,  given  by  occupancy  of  positive  frequencies;  energy  spectral 
density  equals  magnitude  squared  of  Fourier  transform;  computation  of  fractional  energy  con¬ 
tainment  bandwidth  from  energy  spectral  density 


Complex  baseband  representation 

•  Complex  envelope  of  passband  signal:  rectangular  form  (I  and  Q  components),  polar  form 
(envelope  and  phase),  upconversion  and  downconversion,  orthogonality  of  I  and  Q  components 
(under  ideal  synchronization),  frequency  domain  relationship  between  passband  signal  and  its 
complex  envelope 

•  Passband  hltering  can  be  accomplished  in  complex  baseband 

•  Passband  inner  product  and  energy  in  terms  of  complex  baseband  quantities 

Modeling  in  complex  baseband 

•  Frequency  and  phase  offsets:  rotating  phasor  multiplying  complex  envelope,  derotation  to  undo 
offsets 

•  Wireless  multipath  channel:  impulse  response  modeled  as  sum  of  impulses  with  complex-valued 
coefficients,  ray  tracing,  delay  spread  and  coherence  bandwidth 


2.11  Endnotes 


A  detailed  treatment  of  the  material  reviewed  in  Sections  2. 1-2.5  can  be  found  in  basic  textbooks 
on  signals  and  systems  such  as  Oppenheim,  Willsky  and  Nawab  [17]  or  Lathi  [18]. 

The  Matlab  code  fragments  and  software  labs  interspersed  in  this  textbook  provide  a  glimpse  of 
the  use  of  DSP  in  communication.  However,  for  a  background  in  core  DSP  algorithms,  we  refer 
the  reader  to  textbooks  such  as  Oppenheim  and  Schafer  [19]  and  Mitra  [20]. 
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Problems 


LTI  systems  and  Convolution 


Problem  2.1  A  system  with  input  x{t)  has  output  given  by 


y{t)  =  e“  ^x{u)du 


(a)  Show  that  the  system  is  LTI  and  hnd  its  impulse  response. 

(b)  Find  the  transfer  function  H{f)  and  plot  \H{f)\. 

(c)  If  the  input  x{t)  =  2sinc(2t),  hnd  the  energy  of  the  output. 


Problem  2.2  Find  and  sketch  y  =  Xi  *  X2  for  the  following: 

(a)  xi(t)  =  e"‘/[o,oo)(^),  X2(t)  =  xi{-t). 

(b)  =  I[o,2]{t)  -  3/[i,4](t),  X2{t)  =  I[o,i]{t). 

Hint:  In  (b),  you  can  use  the  LTI  property  and  the  known  result  in  Figure  2.12  on  the  convolution 
of  two  boxes. 


Fourier  Series 


Problem  2.3  A  digital  circuit  generates  the  following  periodic  waveform  with  period  0.5: 


u{t) 


1,  0<t<0.1 
0,  l<t<0.5 


where  the  unit  of  time  is  microseconds  throughout  this  problem. 

(a)  Find  the  complex  exponential  Fourier  series  for  du/dt. 

(b)  Find  the  complex  exponential  Fourier  series  for  u{t),  using  the  results  of  (a). 

(c)  Find  an  explicit  time  domain  expression  for  the  output  when  u{t)  is  passed  through  an  ideal 
lowpass  hlter  of  bandwidth  100  KHz. 

(d)  Repeat  (c)  when  the  hlter  bandwidth  is  increased  to  300  KHz. 

(e)  Find  an  explicit  time  domain  expression  for  the  output  when  u(t)  is  passed  through  a  hlter 
with  impulse  response  h2{t)  =  sinc(t)  cos(87rt). 

(f)  Can  you  generate  a  sinusoidal  waveform  of  frequency  1  MHz  by  appropriately  hltering  u{t)7 
If  so,  specify  in  detail  how  you  would  do  it. 


Fourier  Transform  and  Bandwidth 

Problem  2.4  Find  and  sketch  the  Fourier  transforms  for  the  following  signals: 

(a)  u{t)  =  (1  -  |t|)/[_i,i](t). 

(b)  v{t)  =  sinc(2t)sinc(4t). 

(c)  s{t)  =  v{t)  cos2007rt. 

(d)  Classify  each  of  the  signals  in  (a)-(c)  as  baseband  or  passband. 

Problem  2.5  Use  Parseval’s  identity  to  compute  the  following  integrals: 

(a)  f^sinc^(2t)dt 

(b)  sinc(t)sinc(2t)(it 
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Problem  2.6  (a)  For  u{t)  =  sinc(t)  sinc(2t),  where  t  is  in  microseconds,  find  and  plot  the 
magnitude  spectrum  \U{f)\,  carefully  labeling  the  units  of  frequency  on  the  x  axis. 

(b)  Now,  consider  s{t)  =  u{t)  cos2007it.  Plot  the  magnitude  spectrum  |-S'(/)|,  again  labeling 
the  units  of  frequency  and  carefully  showing  the  frequency  intervals  over  which  the  spectrum  is 
nonzero. 

Problem  2.7  The  signal  s{t)  =  sinc4t  is  passed  through  a  filter  with  impulse  response  h{t)  = 
sinc^t  cos  47rt  to  obtain  output  y(t).  Find  and  sketch  the  Fourier  transform  Y{f)  of  the  output 
(sketch  the  real  and  imaginary  parts  separately  if  the  spectrum  is  comp  lex- valued) . 

Problem  2.8  Consider  the  tent  signal  s{t)  =  (1  — 

(a)  Find  and  sketch  the  Fourier  transform  S{f). 

(b)  Compute  the  99%  energy  containment  bandwidth  in  KHz,  assuming  that  the  unit  of  time  is 
milliseconds. 


Problem  2.9  Consider  the  cosine  pulse 

p{t)  =  COSTTt  I[-l/2,l/2]{t) 

(a)  Show  that  the  Fourier  transform  of  this  pulse  is  given  by 

(b)  Use  this  result  to  derive  the  formula  (2.63)  for  the  sine  pulse  in  Example  2.5.5. 


Problem  2.10  (Numerical  computation  of  the  Fourier  transform)  Modify  Code  Frag¬ 
ment  2.5.1  for  Example  2.5.5  to  numerically  compute  the  Fourier  transform  of  the  tent  function 
in  Problem  2.8.  Display  the  magnitude  spectra  of  the  DFT-based  numerically  computed  Fourier 
transform  and  the  analytically  computed  Fourier  transform  (from  Problem  2.8)  in  the  same  plot, 
over  the  frequency  interval  [—10, 10].  Comment  on  the  accuracy  of  the  DFT-based  computation. 


Introducing  the  matched  filter 

Problem  2.11  For  a  signal  s{t),  the  matched  filter  is  defined  as  a  filter  with  impulse  response 
h{t)  =  Smf{t)  =  s*{—t)  (we  allow  signals  to  be  complex  valued,  since  we  want  to  handle  complex 
baseband  signals  as  well  as  physical  real- valued  signals). 

(a)  Sketch  the  matched  filter  impulse  response  for  s(t)  =  /[i^3](t). 

(b)  Find  and  sketch  the  convolution  y(t)  =  {s  *  Smf)(t).  This  is  the  output  when  the  signal  is 
passed  through  its  matched  filter.  Where  does  the  peak  of  the  output  occur? 

(c)  (True  or  False)  Y (/)  >  0  for  all  /. 

Problem  2.12  Repeat  Problem  2.11  for  s(t)  =  /[i^3](t)  —  2J[2,5](f). 


Introducing  delay  spread  and  coherence  bandwidth 

Problem  2.13  A  wireless  channel  has  impulse  response  given  by  h{t)  =  26{t  —  0.1)  +  jd{t  — 
0.64)  —  0.8(5(f  —  2.2),  where  the  unit  of  time  is  in  microseconds. 

(a)  What  is  the  delay  spread  and  coherence  bandwidth? 
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(b)  Plot  the  magnitude  and  phase  of  the  channel  transfer  function  H{f)  over  the  interval 
[—2Bc,2Bc],  where  Be  denotes  the  coherence  bandwidth  computed  in  (a).  Comment  on  how 
the  phase  behaves  when  \H{f)\  is  small. 

(c)  Express  \H{f)\  in  dB,  taking  0  dB  as  the  gain  of  a  nominal  channel  hnomit)  =  2S(t  —  0.1) 
corresponding  to  the  hrst  ray  alone.  What  are  the  fading  depths  that  you  see  with  respect  to 
this  nominal? 

Dehne  the  average  channel  power  gain  over  a  band  [— hP/2,  hP/2]  as 

1  rW/2 

G{W)  =  -  \Hif)\Uf 

Vv  J-WI2 

This  is  a  simplified  measure  of  how  increasing  signal  bandwidth  W  can  help  compensate  for 
frequency-selective  fading:  we  hope  that,  as  W  gets  large,  we  can  average  out  fluctuations  in 

(d)  Plot  G{W)  as  a  function  of  W/ B^  and  comment  on  how  large  the  bandwidth  needs  to  be 
(as  a  multiple  of  Be)  to  provide  “enough  averaging.” 


Complex  envelope  of  passband  signals 
Problem  2.14  Consider  a  passband  signal  of  the  form 

Up{t)  =  a{t)  cos2007rf 

where  a{t)  =  sinc(2f),  and  where  the  unit  of  time  is  in  microseconds. 

(a)  What  is  the  frequency  band  occupied  by  Up{t)l 

(b)  The  signal  Up{t)  cos  IGdnt  is  passed  through  a  lowpass  filter  to  obtain  an  output  b{t).  Give 
an  explicit  expression  for  b{t),  and  sketch  B{f)  (if  B{f)  is  complex- valued,  sketch  its  real  and 
imaginary  parts  separately). 

(c)  The  signal  Mp(t)  sin  IQQvrt  is  passed  through  a  lowpass  filter  to  obtain  an  output  c(t).  Give 
an  explicit  expression  for  c(t),  and  sketch  C(/)  (if  C(/)  is  complex- valued,  sketch  its  real  and 
imaginary  parts  separately). 

(d)  Can  you  reconstruct  a{t)  from  simple  real- valued  operations  performed  on  b{t)  and  c{t)7  If 
so,  sketch  a  block  diagram  for  the  operations  required.  If  not,  say  why  not. 


2  sin(400:t  t+7t/4) 

Figure  2.39:  Operations  involved  in  Problem  2.15. 
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Problem  2.15  Consider  the  signal  s{t)  =  coslOOvrt. 

(a)  Find  and  sketch  the  baseband  signal  u(t)  that  results  when  s(t)  is  downconverted  as  shown 
in  the  upper  branch  of  Figure  2.39. 

(b)  The  signal  s{t)  is  passed  through  the  bandpass  hlter  with  impulse  response  h{t)  =  /[o^p  (t)  sin(4007rt+ 
^).  Find  and  sketch  the  baseband  signal  v(t)  that  results  when  the  hlter  output  y(t)  =  (s  *  h)(t) 

is  downconverted  as  shown  in  the  lower  branch  of  Figure  2.39. 

Problem  2.16  Consider  the  signals  ui(t)  =  /[o,i](t)  coslOOvrf  and  U2(t)  =  /[oj](f)  sin  lOOvrf. 

(a)  Find  the  numerical  value  of  the  inner  product  J^^ui{t)u2it)dt. 

(b)  Find  an  explicit  time  domain  expression  for  the  convolution  y{t)  =  {ui  *  U2){t). 

(c)  Sketch  the  magnitude  spectrum  \Y{f)\  for  the  convolution  in  (b). 

Problem  2.17  Consider  a  real-valued  passband  signal  Vp{t)  whose  Fourier  transform  for  positive 
frequencies  is  given  by 

(  2,  30  <  /  <  32 
Re(Pp(/))  =  <^  0,  0  <  /  <  30 
[  0,  32  <  /  <  cx) 

(  1  -  |/-32|,  31  <  /  <  33 
lmiVp{f))  =  \  0,  0</<31 

[  0,  33  <  f  <  oo 

(a)  Sketch  the  real  and  imaginary  parts  of  Vp{f)  for  both  positive  and  negative  frequencies. 

(b)  Specify,  in  both  the  time  domain  and  the  frequency  domain,  the  waveform  that  you  get  when 
you  pass  Vp{t)  cos(607rf)  through  a  low  pass  hlter. 

Problem  2.18  The  passband  signal  u{t)  =  /[_i  i](t)  cos  lOOvrt  is  passed  through  the  passband 
hlter  h{t)  =  /[o^3](f)  sin  lOOvri.  Find  an  explicit  time  domain  expression  for  the  hlter  output. 

Problem  2.19  Consider  the  passband  signal  Up{t)  =  sinc(t)  cos207rt,  where  the  unit  of  time  is 
in  microseconds. 

(a)  Use  Matlab  to  plot  the  signal  (plot  over  a  large  enough  time  interval  so  as  to  include  “most” 
of  the  signal  energy).  Label  the  units  on  the  time  axis. 

Remark:  Since  you  will  be  plotting  a  discretized  version,  the  sampling  rate  you  should  choose 
should  be  large  enough  that  the  carrier  waveform  looks  reasonably  smooth  (e.g.,  a  rate  of  at  least 
10  times  the  carrier  frequency). 

(b)  Write  a  Matlab  program  to  implement  a  simple  downconverter  as  follows.  Pass  x{t)  = 

2Mp(f)  cos207rf  through  a  lowpass  hlter  which  consists  of  computing  a  sliding  window  average 
over  a  window  of  1  microsecond.  That  is,  the  LPF  output  is  given  by  y{t)  =  x{t)  dr.  Plot 
the  output  and  comment  on  whether  it  is  what  you  expect  to  see. 


Problem  2.20  Consider  the  following  two  passband  signals: 


Up{t)  =  sinc(2f)  cos  lOOvri 


and 


Vp{t)  =  sinc(f)  sin(10l7rt  -|- 


A' 


(a)  Find  the  complex  envelopes  u{t)  and  v{t)  for  Up  and  Up,  respectively,  with  respect  to  the 
frequency  reference  fc  =  50. 

(b)  What  is  the  bandwidth  of  Up{t)l  What  is  the  bandwidth  of  Vp{t)l 

(c)  Find  the  inner  product  {up,Vp),  using  the  result  in  (a). 

(d)  Find  the  convolution  yp(t)  =  (up  *  Vp)(t),  using  the  result  in  (a). 
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Wireless  channel  modeling 


Problem  2.21  Consider  the  two-ray  wireless  channel  model  in  Example  2.9.1. 

(a)  Show  that,  as  long  as  the  range  R  ^  ht,  hr  the  delay  spread  is  well  approximated  as 

2hthr 


where  c  denotes  the  propagation  speed.  We  assnme  free  space  propagation  with  c  =  3  x  10®m/s. 

(b)  Compare  the  approximation  in  (a)  with  the  actnal  value  of  the  delay  spread  for  R  =  200m, 
ht  =  2m,  hr  =  10m.  (e.g.,  modeling  an  outdoor  link  with  LOS  and  single  ground  bounce). 

(c)  What  is  the  coherence  bandwidth  for  the  numerical  example  in  (b). 

(d)  Redo  (b)  and  (c)  for  R  =  10m,  ht  =  hr  =  2m  (e.g.,  a  model  for  an  indoor  link  modeling  LOS 
plus  a  single  wall  bounce). 

Problem  2.22  Consider  R  =  200m,  ht  =  2m,  hr  =  10m  in  the  two-ray  wireless  channel  model 
in  Example  2.9.1.  Assume  Ai  =  1  and  (pi  =  0,  set  A2  =  0.95  and  02  =  and  assume  that  the 
carrier  frequency  is  5  GHz. 

(a)  Specify  the  channel  impulse  response,  normalizing  the  LOS  path  to  unit  gain  and  zero  delay. 
Make  sure  you  specify  the  unit  of  time  being  used. 

(b)  Plot  the  magnitude  and  phase  of  the  channel  transfer  function  over  [—3Br,3Bc\,  where  Be 
denotes  the  channel  coherence  bandwidth. 

(c)  Plot  the  frequency  selective  fading  gain  in  dB  over  [—3Bc,3Bc],  using  a  LOS  channel  as 
nominal.  Comment  on  the  fading  depth. 

(d)  As  in  Problem  2.13,  compute  the  frequency-averaged  power  gain  G{W)  and  plot  it  as  a 
function  of  W/Bc-  How  much  bandwidth  is  needed  to  average  out  the  effects  of  frequency- 
selective  fading? 


Software  Lab  2.1:  Modeling  Carrier  Phase  Uncertainty 

Consider  a  pair  of  independently  modulated  signals,  Uc{t)  =  Yl,n=i^c[n\p{t  —  n)  and  Us{t)  = 
—  where  the  symbols  6c[n],  hs[n]  are  chosen  with  equal  probability  to  be  -|-1  and 
-1,  and  p(t)  =  /[o  i](t)  is  a  rectangular  pulse.  Let  N  =  100. 

(1.1)  Use  Matlab  to  plot  a  typical  realization  of  Uc{t)  and  Us{t)  over  10  symbols.  Make  sure  you 
sample  fast  enough  for  the  plot  to  look  reasonably  “nice.” 

(1.2)  Upconvert  the  baseband  waveform  Uc{t)  to  get 

'Wp,i(^)  =  w(^)  coslOvrf 

This  is  a  so-called  binary  phase  shift  keyed  (BPSK)  signal,  since  the  changes  in  phase  due  to 
the  changes  in  the  signs  of  the  transmitted  symbols.  Plot  the  passband  signal  Mp,i(t)  over  four 
symbols  (you  will  need  to  sample  at  a  multiple  of  the  carrier  frequency  for  the  plot  to  look  nice, 
which  means  you  might  have  to  go  back  and  increase  the  sampling  rate  beyond  what  was  required 
for  the  baseband  plots  to  look  nice). 

(1.3)  Now,  add  in  the  Q  component  to  obtain  the  passband  signal 

Up{t)  =  Ucit)  coslOvrt  —  Usit)  sindOvrt 

Plot  the  resulting  Quaternary  Phase  Shift  Keyed  (QPSK)  signal  Up{t)  over  four  symbols. 

(1.4)  Downconvert  Up{t)  by  passing  2up{t)  cos(407rf  -|-  9)  and  2up{t)  sin(407rf  -|-  6)  through  crude 
lowpass  filters  with  impulse  response  h{t)  =  /[o,o.25](Q-  Denote  the  resulting  I  and  Q  components 
by  Vc{t)  and  Vsit),  respectively.  Plot  Vc  and  Vg  for  6*  =  0  over  10  symbols.  How  do  they  compare 
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to  Uc  and  m*?  Can  yon  read  off  the  corresponding  bits  hc[n]  and  hs[n]  from  eyeballing  the  plots 
for  Vc  and  n<j? 

(1.5)  Plot  Vc  and  Vg  for  9  =  7r/4.  How  do  they  compare  to  Uc  and  Can  you  read  off  the 
corresponding  bits  bc[n]  and  bs[n]  from  eyeballing  the  plots  for  Vc  and 

(1.6)  Figure  out  how  to  recover  Uc  and  Ug  from  Vc  and  Vg  if  a  genie  tells  you  the  value  of  9  (we  are 
looking  for  an  approximate  reconstruction-the  LPFs  used  in  downconversion  are  non-ideal,  and 
the  original  waveforms  are  not  exactly  bandlimited).  Check  whether  your  method  for  undoing 
the  phase  offset  works  for  9  =  vr/d,  the  scenario  in  (1.5).  Plot  the  resulting  reconstructions  Uc  and 
Ms,  and  compare  them  with  the  original  I  and  Q  components.  Can  you  read  off  the  corresponding 
bits  bc[n]  and  bg[n]  from  eyeballing  the  plots  for  Uc  and  Ugl 


Software  Lab  2.2:  Modeling  a  lamppost  based  broadband  network 

The  background  for  this  lab  is  provided  in  Section  2.9,  which  discusses  wireless  channel  modeling. 
This  material  should  be  reviewed  prior  to  doing  the  lab. 


Direct  path  (200  m) 


Lamppost  2 


Figure  2.40:  Links  in  a  lamppost-based  network. 


Consider  a  lamppost-based  network  supplying  broadband  access  using  unlicensed  spectrum  at  5 
GHz.  Figure  2.40  shows  two  kinds  of  links:  lamppost-to-lamppost  for  backhaul,  and  lamppost- 
to-mobile  for  access,  where  we  show  nominal  values  of  antenna  heights  and  distances.  We  explore 
simple  channel  models  for  each  case,  consisting  only  of  the  direct  path  and  the  ground  reflection. 
For  simplicity,  assume  throughout  that  Hi  =  1,  0i  =  0  for  the  direct  path,  and  A2  =  0.98,  02  = 
for  the  ground  reflection  (we  assume  a  phase  shift  of  tt  for  the  reflected  ray  even  though  it  may 
not  be  at  grazing  incidence,  especially  for  the  lamppost  to  mobile  link). 

(2.1)  Find  the  delay  spread  and  coherence  bandwidth  for  the  lamppost-to-lamppost  link.  If  the 
message  signal  has  20  MHz  bandwidth,  is  it  “narrowband”  with  respect  to  this  channel? 

(2.2)  Repeat  item  (2.1)  for  the  lamppost-to-car  link  when  the  car  is  100  m  away  from  each 
lamppost. 

Fading  and  diversity  for  the  backhaul  link 

First,  let  us  explore  the  sensitivity  of  the  lamppost  to  lamppost  link  to  variations  in  range  and 
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height.  Fix  the  height  of  the  transmitter  on  lamppost  1  at  10  m.  Vary  the  height  of  the  receiver 
on  lamppost  2  from  9.5  to  10.5  m. 

(2.3)  Letting  hnom  denote  the  nominal  channel  gain  between  two  lampposts  if  yon  only  consider 
the  direct  path  and  h  the  net  complex  gain  including  the  reflected  path,  plot  the  normalized 
power  gain  in  dB,  20  login  ,  as  a  function  of  the  variation  in  the  receiver  height.  Comment 
on  the  sensitivity  of  channel  quality  to  variations  in  the  receiver  height. 

(2.4)  Modeling  the  variations  in  receiver  height  as  coming  from  a  uniform  distribution  over 

(9.5. 10.5) ,  hnd  the  probability  that  the  normalized  power  gain  is  smaller  than  -20  dB?  (i.e.,  that 
we  have  a  fade  in  signal  power  of  20  dB  or  worse). 

(2.5)  Now,  suppose  that  the  transmitter  has  two  antennas,  vertically  spaced  by  25  cm,  with 

the  lower  one  at  a  height  of  10  m.  Let  hi  and  h2  denote  the  channels  from  the  two  antennas 
to  the  receiver.  Let  hnom  be  dehned  as  in  item  (2.3).  Plot  the  normalized  power  gains  in  dB, 
20  login  i  =  1,  2.  Comment  on  whether  or  not  both  gains  dip  or  peak  at  the  same  time. 

(2.6)  Plot  20  logj^o  ^  which  is  the  normalized  power  gain  you  would  get  if  you  switched 

to  the  transmit  antenna  which  has  the  better  channel.  This  strategy  is  termed  switched  diversity. 

(2.7)  Find  the  probability  that  the  normalized  power  gain  of  the  switched  diversity  scheme  is 
smaller  than  -20  dB. 

(2.8)  Comment  on  whether,  and  to  what  extent,  diversity  helped  in  combating  fading. 

Fading  on  the  access  link 

Consider  the  access  channel  from  lamppost  1  to  the  car.  Let  hnom{D)  denote  the  nominal  channel 
gain  from  the  lamppost  to  the  car,  ignoring  the  ground  reflection.  Taking  into  account  the  ground 
reflection,  let  the  channel  gain  be  denoted  as  h{D).  Here  D  is  the  distance  of  the  car  from  the 
bottom  of  lamppost  1,  as  shown  in  Figure  2.40. 

(2.9)  Plot  \hnom\  and  \h\  as  a  function  of  H  on  a  dB  scale  (an  amplitude  a  is  expressed  on  the  dB 
scale  as  201og^o*a)-  Comment  on  the  “long-term”  variation  due  to  range,  and  the  “short-term” 
variation  due  to  multipath  fading. 


Chapter  3 

Analog  Communication  Techniques 


Modulation  is  the  process  of  encoding  information  into  a  signal  that  can  be  transmitted  (or 
recorded)  over  a  channel  of  interest.  In  analog  modulation,  a  baseband  message  signal,  such 
as  speech,  audio  or  video,  is  directly  transformed  into  a  signal  that  can  be  transmitted  over 
a  designated  channel,  typically  a  passband  radio  frequency  (RF)  channel.  Digital  modulation 
differs  from  this  only  in  the  following  additional  step:  bits  are  encoded  into  baseband  message 
signals,  which  are  then  transformed  into  passband  signals  to  be  transmitted.  Thus,  despite 
the  relentless  transition  from  digital  to  analog  modulation,  many  of  the  techniques  developed  for 
analog  communication  systems  remain  important  for  the  digital  communication  systems  designer, 
and  our  goal  in  this  chapter  is  to  study  an  important  subset  of  these  techniques,  using  legacy 
analog  communication  systems  as  examples  to  reinforce  concepts. 

From  Chapter  2,  we  know  that  passband  signals  carry  information  in  their  complex  envelope, 
and  that  the  complex  envelope  can  be  represented  either  in  terms  of  I  and  Q  components,  or  in 
terms  of  envelope  and  phase.  We  study  two  broad  classes  of  techniques:  amplitude  modula¬ 
tion,  in  which  the  analog  message  signal  appears  directly  in  the  I  and/or  Q  components;  and 
angle  modulation,  in  which  the  analog  message  signal  appears  directly  in  the  phase  or  in  the 
instantaneous  frequency  (i.e.,  in  the  derivative  of  the  phase),  of  the  transmitted  signal.  Examples 
of  analog  communication  in  space  include  AM  radio,  FM  radio,  and  broadcast  television,  as  well 
as  a  variety  of  specialized  radios.  Examples  of  analog  communication  in  time  (i.e.,  for  storage) 
include  audiocassettes  and  VHS  videotapes. 

The  analog-centric  techniques  covered  in  this  chapter  include  envelope  detection,  superhetero¬ 
dyne  reception,  limiter  discriminators,  and  phase  locked  loops.  At  a  high  level,  these  techniques 
tell  us  how  to  go  from  baseband  message  signals  to  passband  transmitted  signals,  and  back 
from  passband  received  signals  to  baseband  message  signals.  For  analog  communication,  this 
is  enough,  since  we  consider  continuous  time  message  signals  which  are  directly  transformed  to 
passband  through  amplitude  or  angle  modulation.  For  digital  communication,  we  need  to  also 
hgure  out  how  to  decode  the  encoded  bits  from  the  received  passband  signal,  typically  after  down- 
conversion  to  baseband;  this  is  a  subject  discussed  in  later  chapters.  However,  between  encoding 
at  the  transmitter  and  decoding  at  the  receiver,  a  number  of  analog  communication  techniques 
are  relevant:  for  example,  we  need  to  decide  between  direct  and  superheterodyne  architectures 
for  upconversion  and  downconversion,  and  tailor  our  frequency  planning  appropriately;  we  may 
use  a  PEL  to  synthesize  the  local  oscillator  frequencies  at  the  transmitter  and  receiver;  and 
the  basic  techniques  for  mapping  baseband  signals  to  passband  remain  the  same  (amplitude 
and/or  angle  modulation).  In  addition,  while  many  classical  analog  processing  functionalities 
are  replaced  by  digital  signal  processing  in  modern  digital  communication  transceivers,  when  we 
push  the  limits  of  digital  communication  systems,  in  terms  of  lowering  power  consumption  or 
increasing  data  rates,  it  is  often  necessary  to  fall  back  on  analog-centric,  or  hybrid  digital-analog, 
techniques.  This  is  because  the  analog-to-digital  conversion  required  for  digital  transceiver  im- 
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plementations  may  often  be  too  costly  or  power-hungry  for  ultra  high-speed,  or  ultra  low-power, 
implement  at  ions . 

Chapter  Plan:  After  a  quick  discussion  of  terminology  and  notation  in  Section  3.1,  we  discuss 
various  forms  of  amplitude  modulation  in  Section  3.2,  including  bandwidth  requirements  and  the 
tradeoffs  between  power  efficiency  and  simplicity  of  demodulation.  We  discuss  angle  modulation 
in  Section  3.3,  including  the  relation  between  phase  and  frequency  modulation,  the  bandwidth 
of  angle  modulated  signals,  and  simple  suboptimal  demodulation  strategies.  The  superhetero¬ 
dyne  up/downconversion  architecture  is  discussed  in  Section  3.4,  and  the  design  considerations 
illustrated  via  the  example  of  analog  AM  radio.  The  phase  locked  loop  (PLL)  is  discussed  in 
Section  3.5,  including  discussion  of  applications  such  as  frequency  synthesis  and  FM  demodu¬ 
lation,  linearized  modeling  and  analysis,  and  a  glimpse  of  the  insights  provided  by  nonlinear 
models.  Finally,  we  discuss  some  legacy  analog  communication  systems  in  Section  3.6,  mainly  to 
highlight  some  of  the  creative  design  choices  that  were  made  in  times  when  sophisticated  digital 
signal  processing  techniques  were  not  available.  This  last  section  can  be  skipped  if  the  reader’s 
interest  is  limited  to  learning  analog-centric  techniques  for  digital  communication  system  design. 


3.1  Terminology  and  notation 

Message  Signal:  In  the  remainder  of  this  chapter,  the  analog  baseband  message  signal  is 
denoted  by  m{t).  Depending  on  convenience  of  exposition,  we  shall  think  of  this  message  as 
either  hnite  power  or  hnite  energy.  In  practice,  any  message  we  would  encounter  in  practice 
would  have  hnite  energy  when  we  consider  a  hnite  time  interval.  However,  when  modeling 
transmissions  over  long  time  intervals,  it  is  useful  to  think  of  messages  as  hnite  power  signals 
spanning  an  inhnite  time  interval.  On  the  other  hand,  when  discussing  the  ehect  of  the  message 
spectrum  on  the  spectrum  of  the  transmitted  signal,  it  may  be  convenient  to  consider  a  hnite 
energy  message  signal.  Since  we  consider  physical  message  signals,  the  time  domain  signal  is  real¬ 
valued,  so  that  its  Fourier  transform  (dehned  for  a  hnite  energy  signal)  is  conjugate  symmetric: 
M(/)  =  M*(— /).  For  a  hnite  power  (inhnite  energy)  message,  recall  from  Chapter  2  that  the 
power  is  dehned  as  a  time  average  in  the  limit  of  an  inhnite  observation  interval,  as  follows: 

_  1 

m?  =  lim  —  /  m‘^(t)dt 

To^oo  To  Jq 


Similarly,  the  DC  value  is  dehned  as 


_  1  r° 

m  =  lim  —  /  m(t)dt 

To^OO  To  JQ 

We  typically  assume  that  the  DC  value  of  the  message  is  zero:  m  =  0. 

A  simple  example,  shown  in  Figure  3.1,  that  we  shall  use  often  is  a  hnite-power  sinusoidal 
message  signal,  m{t)  =  AmCos27ifmt,  whose  spectrum  consists  of  impulses  at  i:fm-  M{f)  = 
^  {d{f  —  fm)  +  d{f  +  fm))-  For  this  message,  m  =  0  and  =  A‘f^/2. 

Transmitted  Signal:  When  the  signal  transmitted  over  the  channel  is  a  passband  signal,  it 
can  be  written  as  (see  Chapter  2) 

Up{t)  =  Ucit)  cos(27r/ct)  —  Ugif)  sin(27r/cf)  =  e{t)  cos(27r/cf  -|-  0{t)) 

where  fc  is  a  carrier  frequency,  Uc(t)  is  the  I  component,  Us(t)  is  the  Q  component,  e(t)  >  0  is  the 
envelope,  and  9{t)  is  the  phase.  Modulation  consist  of  encoding  the  message  in  Uc{t)  and  Us{t),  or 
equivalently,  in  e{t)  and  9{t).  In  most  of  the  analog  amplitude  modulation  schemes  considered. 


.M(f) 

-KJl 


(a)  Sinusoidal  message  waveform  (b)  Sinusoidal  message  spectrum 


Figure  3.1:  Sinusoidal  message  and  its  spectrum 


the  message  modulates  the  I  component  (with  the  Q  component  occasionally  playing  a  “sup¬ 
porting  role”)  as  discussed  in  Section  3.2.  The  exception  is  quadrature  amplitude  modulation,  in 
which  both  I  and  Q  components  carry  separate  messages.  In  phase  and  frequency  modulation, 
or  angle  modulation,  the  message  directly  modulates  the  phase  9{t)  or  its  derivative,  keeping  the 
envelope  e{t)  unchanged. 


3.2  Amplitude  Modulation 

We  now  discuss  a  number  of  variants  of  amplitude  modulation,  in  which  the  baseband  message 
signal  modulates  the  amplitude  of  a  sinusoidal  carrier  whose  frequency  falls  in  the  passband  over 
which  we  wish  to  communicate. 


3.2.1  Double  Sideband  (DSB)  Suppressed  Carrier  (SC) 

Here,  the  message  m  modulates  the  I  component  of  the  passband  transmitted  signal  u  as  follows: 

UDSB{t)  =  Am{t)  cos(27r/cf)  (3.1) 

Taking  Fourier  transforms,  we  have 

UdsbU)  =  I  {MU  -  /,)  +  MU  +  /,))  (3.2) 

The  time  domain  and  frequency  domain  DSB  signals  for  a  sinusoidal  message  are  shown  in  Figure 

3.2. 

As  another  example,  consider  the  hnite-energy  message  whose  spectrum  is  shown  in  Figure  3.3. 
Since  the  time  domain  message  m(t)  is  real- valued,  its  spectrum  exhibits  conjugate  symmetry 
(we  have  chosen  a  complex- valued  message  spectrum  to  emphasize  the  latter  property).  The 
message  bandwidth  is  denoted  by  B.  The  bandwidth  of  the  DSB-SC  signal  is  2B,  which  is  twice 
the  message  bandwidth.  This  indicates  that  we  are  being  redundant  in  our  use  of  spectrum.  To 
see  this,  consider  the  upper  sideband  (USB)  and  lower  sideband  (LSB)  depicted  in  Figure  3.4. 
The  shape  of  the  signal  in  the  USB  (i.e.,  Up{f)  for  /c  <  /  <  /c  +  -B)  is  the  same  as  that  of  the 
message  for  positive  frequencies  (i.e.,  M{f),f  >  0).  The  shape  of  the  signal  in  the  LSB  (i.e., 
Up{f)  ioT  fc  —  B  <  f  <  fc)  is  the  same  as  that  of  the  message  for  negative  frequencies  (i.e., 
<  0)-  Since  m{t)  is  real-valued,  we  have  M{—f)  =  M*{f),  so  that  we  can  reconstruct 
the  message  if  we  know  its  content  at  either  positive  or  negative  frequencies.  Thus,  the  USB  and 


(a)  DSB  time  domain  waveform  (b)  DSB  spectrum 

Figure  3.2:  DSB-SC  signal  in  the  time  and  frequency  domains  for  the  sinusoidal  message  m{t) 
Am  COS  271  fmt  of  Figure  3.1. 


Re(M(f)) 
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Figure  3.4:  The  spectrum  of  the  passband  DSB-SC  signal  for  the  example  message  in  Figure  3.3. 


LSB  of  u{t)  each  contain  enough  information  to  reconstruct  the  message.  The  term  DSB  refers  to 
the  fact  that  we  are  sending  both  sidebands.  Doing  this,  of  course,  is  wasteful  of  spectrum.  This 
motivates  single  sideband  (SSB)  and  vestigial  sideband  (VSB)  modulation,  which  are  discussed 
a  little  later. 

The  term  suppressed  carrier  is  employed  because,  for  a  message  with  no  DC  component,  we 
see  from  (3.2)  that  the  transmitted  signal  does  not  have  a  discrete  component  at  the  carrier 
frequency  (i.e.,  Up{f)  does  not  have  impulses  at  ±/c). 


Passband  received 
signal 


Lowpass 

Filter 


Estimated 

message 


2cos  27tt;,t 


Figure  3.5:  Coherent  demodulation  for  AM. 


Demodulation  of  DSB-SC:  Since  the  message  is  contained  in  the  I  component,  demodulation 
consists  of  extracting  the  I  component  of  the  received  signal,  which  we  know  how  to  do  from 
Chapter  2:  multiply  the  received  signal  with  the  cosine  of  the  carrier,  and  pass  it  through  a  low 
pass  hlter.  Ignoring  noise,  the  received  signal  is  given  by 

Hpif)  =  Am{t)  cos(27r/ct  +  Or)  (3.3) 

where  Or  is  the  phase  of  the  received  carrier  relative  to  the  local  copy  of  the  carrier  produced 
by  the  receiver’s  local  oscillator  (LO),  and  A  is  the  received  amplitude,  taking  into  account  the 
propagation  channel  from  the  transmitter  to  the  receiver.  The  demodulator  is  shown  in  Figure 
3.5.  In  order  for  this  demodulator  to  work  well,  we  must  have  Or  as  close  to  zero  as  possible; 
that  is,  the  carrier  produced  by  the  LO  must  be  coherent  with  the  received  carrier.  To  see  the 
effect  of  phase  mismatch,  let  us  compute  the  demodulator  output  for  arbitrary  Or-  Using  the 
trigonometric  identity  2cos6'i  cos  6^2  =  cos(6'i  —  O2)  +  cos(6'i  +  O2),  we  have 

2yp{t)  cos(27r/ct)  =  Am{t)  cos(27r/ct  +  Or)  cos(27r/ct)  =  Am{t)  cos,  Or  +  Am{t)  cos(47r/ct  +  Or) 
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We  recognize  the  second  term  on  the  extreme  right-hand  side  as  being  a  passband  signal  at  2/^ 
(since  it  is  a  baseband  message  multiplied  by  a  carrier  whose  frequency  exceeds  the  message 
bandwidth).  It  is  therefore  rejected  by  the  lowpass  hlter.  The  hrst  term  is  a  baseband  signal 
proportional  to  the  message,  which  appears  unchanged  at  the  output  of  the  LPF  (except  possibly 
for  scaling),  as  long  as  the  LPF  response  has  been  designed  to  be  flat  over  the  message  bandwidth. 
The  output  of  the  demodulator  is  therefore  given  by 

m{t)  =  Am(t)  cos  9 r  (3-4) 

We  can  also  infer  this  using  the  complex  baseband  representation,  which  is  what  we  prefer  to 
employ  instead  of  unwieldy  trigonometric  identities.  The  coherent  demodulator  in  Figure  3.5 
extracts  the  I  component  relative  to  the  receiver’s  LO.  The  received  signal  can  be  written  as 

Upit)  =  Am{t)  cos(27r/ct  -|-  6r)  =  Re  =  Re  (^Am{t)e^^^ 

from  which  we  can  read  off  the  complex  envelope  y{t)  =  Am{t)e^^^ .  The  real  part  yc{t)  = 
Am{t)  cos6r  is  the  I  component  extracted  by  the  demodulator. 

The  demodulator  output  (3.4)  is  proportional  to  the  message,  which  is  what  we  want,  but 
the  proportionality  constant  varies  with  the  phase  of  the  received  carrier  relative  to  the  LO. 
In  particular,  the  signal  gets  signihcantly  attenuated  as  the  phase  mismatch  increases,  and  gets 
completely  wiped  out  for  Or  =  ^.  Note  that,  if  the  carrier  frequency  of  the  LO  is  not  synchronized 
with  that  of  the  received  carrier  (say  with  frequency  offset  A/),  then  Or{t)  =  27TAft  +  (j)  is  a  time- 
varying  phase  that  takes  all  values  in  [0,27r),  which  leads  to  time- varying  signal  degradation  in 
amplitude,  as  well  as  unwanted  sign  changes.  Thus,  for  coherent  demodulation  to  be  successful, 
we  must  drive  A/  to  zero,  and  make  0  as  small  as  possible;  that  is,  we  must  synchronize  to 
the  received  carrier.  One  possible  approach  to  use  feedback-based  techniques  such  as  the  phase 
locked  loop,  discussed  later  in  this  chapter. 


3.2.2  Conventional  AM 

In  conventional  AM,  we  add  a  large  carrier  component  to  a  DSB-SC  signal,  so  that  the  passband 
transmitted  signal  is  of  the  form: 

UAuit)  =  Am{t)  cos(27r/cf)  +  AcCos(27r/cf)  (3.5) 

Taking  the  Fourier  transform,  we  have 

UamU)  =  4  (M{J  -  /,)  +  M(f  +  /,))  +  ^  (S{  f  -  /,)  +  +  fc)) 

which  means  that,  in  addition  to  the  USB  and  LSB  due  to  the  message  modulation,  we  also  have 
impulses  at  ±/c  due  to  the  unmodulated  carrier.  Figure  3.6  shows  the  resulting  spectrum. 

The  key  concept  behind  conventional  AM  is  that,  by  making  A^  large  enough,  the  message  can  be 
demodulated  using  a  simple  envelope  detector.  Large  Ac  corresponds  to  expending  transmitter 
power  on  sending  an  unmodulated  carrier  which  carries  no  message  information,  in  order  to 
simplify  the  receiver.  This  tradeoff  makes  sense  in  a  broadcast  context,  where  one  powerful 
transmitter  may  be  sending  information  to  a  large  number  of  low-cost  receivers,  and  is  the 
design  approach  that  has  been  adopted  for  broadcast  AM  radio.  A  more  detailed  discussion 
follows. 

The  envelope  of  the  AM  signal  in  (3.5)  is  given  by 

e{t)  =  \Am{t)  +  Ac\ 
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R=(Uam  (f)) 
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Im(U^  (f)) 


Figure  3.6:  The  spectrum  of  a  conventional  AM  signal  for  the  example  message  in  Figure  3.3. 


If  the  term  inside  the  magnitude  operation  is  always  nonnegative,  we  have  e{t)  =  Am{t)  +  Ac- 
In  this  case,  we  can  read  off  the  message  signal  directly  from  the  envelope,  using  AC  coupling  to 
get  rid  of  the  DC  offset  due  to  the  second  term.  For  this  to  happen,  we  must  have 


A  m{t)  +  Ac  >  0  for  all  t  A  mintmit)  +  Ac  >  0  (3.6) 

Let  mmtm{t)  =  —Mo,  where  Mq  =  |minfm(t)|.  (Note  that  the  minimum  value  of  the  message 
must  be  negative  if  the  message  has  zero  DC  value.)  Equation  (3.6)  reduces  to  —AMq  +  Ac  >  0, 
or  Ac  >  AMq.  Let  us  dehne  the  modulation  index  amod  as  the  ratio  of  the  size  of  the  biggest 
negative  incursion  due  to  the  message  term  to  the  size  of  the  unmodulated  carrier  term: 


^mod  — 


AMq  A|minfm(t)| 


Ac 


Ac 


The  condition  (3.6)  for  accurately  recovering  the  message  using  envelope  detection  can  now  be 
rewritten  as 

amod  <  1  (3.7) 

It  is  also  convenient  to  dehne  a  normalized  version  of  the  message  as  follows: 


mn(t)  = 


m{t) 

Mo  |mintm(f)| 


m{t) 


(3.8) 


which  satishes 

mintm(t) 

mintmAt)  = - — - =  -1 

It  is  easy  to  see  that  the  AM  signal  (3.5)  can  be  rewritten  as 

UAM{t)  =  Ac  (1  +  amodmnit))  cos(27r/ct)  (3.9) 

which  clearly  brings  out  the  role  of  modulation  index  in  ensuring  that  envelope  detection  works. 

Figure  3.7  illustrates  the  impact  of  modulation  index  on  the  viability  of  envelope  detection,  where 
the  message  signal  is  the  sinusoidal  message  in  Figure  3.1.  For  Omod  =  0.5  and  Omod  =  1,  we  see 
that  envelope  equals  a  scaled  and  DC-shifted  version  of  the  message.  For  Omod  =  1-5,  we  see  that 
the  envelope  no  longer  follows  the  shape  of  the  message. 

Demodulation  of  Conventional  AM:  Ignoring  noise,  the  received  signal  is  given  by 

yp{t)  =  B{1  +  amodmnit))  cos(27r/ct  9r)  (3.10) 
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(a)  Modulation  Index  Umod  =  0.5 


(b)  Modulation  Index  Umod  =  1-0 


(c)  Modulation  Index  Umod  =  1.5 


Figure  3.7:  Time  domain  AM  waveforms  for  a  sinusoidal  message.  The  envelope  no  longer  follows 
the  message  for  modulation  index  larger  than  one. 
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Figure  3.8:  Envelope  detector  demodulation  of  AM.  The  envelope  detector  output  is  typically 
passed  through  a  DC  blocking  capacitance  (not  shown)  to  eliminate  the  DC  offset  due  to  the 
carrier  component  of  the  AM  signal. 


Figure  3.9:  The  relation  between  the  envelope  detector  output  Voutif)  (shown  in  bold)  and  input 
Vin{t)  (shown  as  dashed  line).  The  output  closely  follows  the  envelope  (shown  as  dotted  line). 
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where  6^  is  a  phase  offset  which  is  unknown  a  priori,  if  we  do  not  perform  carrier  synchronization. 
However,  as  long  as  ttmod  <  1,  we  can  recover  the  message  without  knowing  6r  using  envelope 
detection,  since  the  envelope  is  still  just  a  scaled  and  DC-shifted  version  of  the  message.  Of 
course,  the  message  can  also  be  recovered  by  coherent  detection,  since  the  I  component  of  the 
received  carrier  equals  a  scaled  and  DC-shifted  version  of  the  message.  However,  by  doing  enve¬ 
lope  detection  instead,  we  can  avoid  carrier  synchronization,  thus  reducing  receiver  complexity 
drastically.  An  envelope  detector  is  shown  in  Figure  3.8,  and  an  example  (where  the  envelope 
is  a  straight  line)  showing  how  it  works  is  depicted  in  Figure  3.9.  The  diode  (we  assume  that  it 
is  ideal)  conducts  in  only  the  forward  direction,  when  the  input  voltage  Vin(t)  of  the  passband 
signal  is  larger  than  the  output  voltage  Voutif)  across  the  RC  hlter.  When  this  happens,  the 
output  voltage  becomes  equal  to  the  input  voltage  instantaneously  (under  the  idealization  that 
the  diode  has  zero  resistance).  In  this  regime,  we  have  Voutif)  =  Vin{t).  When  the  input  voltage  is 
smaller  than  the  output  voltage,  the  diode  does  not  conduct,  and  the  capacitor  starts  discharging 
through  the  resistor  with  time  constant  RC.  As  shown  in  Figure  3.9,  in  this  regime,  starting  at 
time  ti,  we  have  v{t)  =  ,  where  vi  =  n(ti),  as  shown  in  Figure  3.9. 

Roughly  speaking,  the  capacitor  gets  charged  at  each  carrier  peak,  and  discharges  between  peaks. 
The  time  interval  between  snccessive  charging  episodes  is  therefore  approximately  equal  to  j-, 
the  time  between  snccessive  carrier  peaks.  The  factor  by  which  the  output  voltage  is  reduced 
during  this  period  due  to  capacitor  discharge  is  exp  (— l/(/ci?C)).  This  must  be  close  to  one  in 
order  for  the  voltage  to  follow  the  envelope,  rather  than  the  variations  in  the  sinusoidal  carrier. 
That  is,  we  must  have  fcRC  S>  1.  On  the  other  hand,  the  decay  in  the  envelope  detector  output 
must  be  fast  enough  (i.e.,  the  RC  time  constant  must  be  small  enough)  so  that  it  can  follow 
changes  in  the  envelope.  Since  the  time  constant  for  envelope  variations  is  inversely  proportional 
to  the  message  bandwidth  B,  we  must  have  RC  <^1/5.  Combining  these  two  conditions  for 
envelope  detection  to  work  well,  we  have 


(3.11) 


This  of  course  requires  that  fci^B  (carrier  frequency  much  larger  than  message  bandwidth), 
which  is  typically  satished  in  practice.  For  example,  the  carrier  frequencies  in  broadcast  AM 
radio  are  over  500  KHz,  whereas  the  message  bandwidth  is  limited  to  5  KHz.  Applying  (3.11), 
the  RC  time  constant  for  an  envelope  detector  should  be  chosen  so  that 


2  IJ.S  RC  -C  200  yus 


In  this  case,  a  good  choice  of  parameters  would  be  RC  =  20/is,  for  example,  with  R  =  50  ohms, 
and  C  =  400  nanofarads. 

Power  efficiency  of  conventional  AM:  The  price  we  pay  for  the  receiver  simplicity  of  conven¬ 
tional  AM  is  power  inefficiency:  in  (3.5)  the  unmodulated  carrier  Ac  cos(27r/ct)  is  not  carrying 
any  information  regarding  the  message.  We  now  compute  the  power  efficiency  rjAM,  which  is 
defined  as  the  ratio  of  the  transmitted  power  due  to  the  message-bearing  term  Am{t)  cos(27r/ct) 
to  the  total  power  of  UAuit).  In  order  to  express  the  result  in  terms  of  the  modulation  index, 
let  us  use  the  expression  (3.9). 


The  second  term  on  the  right-hand  side  is  the  DC  value  of  a  passband  signal  at  2/c,  which  is 
zero.  Expanding  out  the  first  term,  we  have 


(3.12) 
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assuming  that  the  message  has  zero  DC  value.  The  power  of  the  message-bearing  term  can  be 
similarly  computed  as 

- ^ -  ^2  _ 

{Acamodmn{t))  COS^  {271  fct)  = 

SO  that  the  power  efficiency  is  given  by 


Vam 


amod^r. 


1  +  a: 


mod 


mi 


(3.13) 


Noting  that  is  normalized  so  that  its  most  negative  value  is  —1,  for  messages  which  have 
comparable  positive  and  negative  excursions  around  zero,  we  expect  |m„(f)|  <  1,  and  hence 
average  power  <  1  (typical  values  are  much  smaller  than  one).  Since  amod  <  1  for  envelope 
detection  to  work,  the  power  efficiency  of  conventional  AM  is  at  best  50%.  For  a  sinusoidal 
message,  for  example,  it  is  easy  to  see  that  =  1/2,  so  that  the  power  efficiency  is  at  most  33%. 
For  speech  signals,  which  have  signihcantly  higher  peak-to-average  ratio,  the  power  efficiency  is 
even  smaller. 


Example  3.2.1  (AM  power  efficiency  computation):  The  message  m{t)  =  2sin20007rf — 
3cos40007rf  is  used  in  an  AM  system  with  a  modulation  index  of  70%  and  carrier  frequency 
of  580  KHz.  What  is  the  power  efficiency?  If  the  net  transmitted  power  is  10  watts,  hnd  the 
magnitude  spectrum  of  the  transmitted  signal. 

We  need  to  hnd  Mq  =  |minim(f)|  in  order  to  determine  the  normalized  form  m„(t)  =  m(f)/Mo. 
To  simplify  notation,  let  x  =  20007rt,  and  minimize  g{x)  =  2sinx  —  3cos2x.  Since  g  is  periodic 
with  period  27r,  we  can  minimize  it  numerically  over  a  period.  However,  we  can  perform  the 
minimization  analytically  in  this  case.  Differentiating  g^  we  obtain 

g'{x)  =  2  cos  X  +  Q  sin  2x  =  0 


This  gives 

2  cos  a:  -|-  12  sin  x  cos  x  =  2cosx(l  -I-  Osin  x)  =  0 

There  are  two  solutions  cos x  =  0  and  sin x  =  —  |.  The  hrst  solution  gives  cos 2x  =  2  cos^  x  —  1  = 
— 1  and  sinx  =  ±1,  which  gives  g{x)  =  1,5.  The  second  solution  gives  cos2x  =  1  —  2sin^x  = 
1  —  2/36  =  17/18,  which  gives  g{x)  =  2(— 1/6)  —  3(17/18)  =  —19/6.  We  therefore  obtain 

Mo  =  \mmtm{t)\  =  19/6 


This  gives 


This  gives 


m{t)  12  18 

iTT'nit)  =  =  —  sm  lOvrf - cos  207rf 

^  ^  Mo  19  19 


ml  =  (12/19)2(1/2)  +  (18/19)2(1/2)  =  0.65 


Substituting  in  (3.13),  setting  amod  =  0.7,  we  obtain  a  power  efficiency  rjAM  =  0.24,  or  24%. 

To  hgure  out  the  spectrum  of  the  transmitted  signal,  we  must  hnd  Ac  in  the  formula  (3.9).  The 
power  of  the  transmitted  signal  is  given  by  (3.12)  to  be 


10  =  A  (i  +  =  A  (1  +  (0.7")(0.65)) 

which  yields  Ac  ~  3.9.  The  overall  AM  signal  is  given  by 


UAM{t)  =  Ac{l  +  amodmn{t))  cos27i  fct  =  Ac  {1  +  Oi  sin27r/it  02  cos  dvr /H)  cos  271  fct 
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where  ai  =  0.7(12/19)  =  0.44,  02  =  0.7(— 18/19)  =  —0.66,  /i  =  1  KHz  and  fc  =  580KHz.  The 
magnitude  spectrum  is  given  by 


\UAM{f)\  =  AJ2  {6{f  -  /,)  +  6{f  +  /,)) 

+  A|ai|/4(5(/-/,-A)  +  <5(/-/,  +  /0  +  5(/  +  /,  +  /i)  +  5(/  +  /,-/i)) 

+  H,|a2|/4  m  -  -  2/1)  +  6{f  -  /,  +  2/1)  +  6{f  +  /,  +  2/1)  +  6{f  +  /,  -  2/0) 


with  numerical  values  shown  in  Figure  3.10. 


Figure  3.10:  Magnitude  spectrum  for  the  AM  waveform  in  Example  3.2.1. 


3.2.3  Single  Sideband  Modulation  (SSB) 

In  SSB  modulation,  we  send  either  the  upper  sideband  or  the  lower  sideband  of  a  DSB-SC  signal. 
For  the  running  example,  the  spectra  of  the  passband  USB  and  LSB  signals  are  shown  in  Figure 
3.11. 

From  our  discussion  of  DSB-SC,  we  know  that  each  sideband  provides  enough  information  to 
reconstruct  the  message.  But  how  do  we  physically  reconstruct  the  message  from  an  SSB  signal? 
To  see  this,  consider  the  USB  signal  depicted  in  Figure  3.11(a).  We  can  reconstruct  the  baseband 
message  if  we  can  move  the  component  near  +fc  to  the  left  by  fc,  and  the  component  near  —fc 
to  the  right  by  fc',  that  is,  if  we  move  in  the  passband  components  towards  the  origin.  These 
two  frequency  translations  can  be  accomplished  by  multiplying  the  USB  signal  by  2cos27r/ct  = 
^j2iTfct  _|_  g-j27r/ct^  in  Figure  3.5,  which  creates  the  desired  message  signal  at  baseband, 

as  well  as  undesired  frequency  components  at  ±2/^  which  can  be  rejected  by  a  lowpass  hlter.  It 
can  be  checked  that  the  same  argument  applies  to  LSB  signals  as  well. 

It  follows  from  the  preceding  discussion  that  SSB  signals  can  be  demodulated  in  exactly  the 
same  fashion  as  DSB-SC,  using  the  coherent  demodulator  depicted  in  Figure  3.5.  Since  this 
demodulator  simply  extracts  the  I  component  of  the  passband  signal,  the  I  component  of  the 
SSB  signal  must  be  the  message.  In  order  to  understand  the  structure  of  an  SSB  signal,  it  remains 
to  identify  the  Q  component.  This  is  most  easily  done  by  considering  the  complex  envelope  of 
the  passband  transmitted  signal.  Consider  again  the  example  USB  signal  in  Figure  3.11(a).  The 
spectrum  U{f)  of  its  complex  envelope  relative  to  fc  is  shown  in  Figure  3.12.  Now,  the  spectra 
of  I  and  Q  components  can  be  inferred  as  follows: 


Applying  these  equations,  we  get  I  and  Q  components  as  shown  in  Figure  3.13. 


A  a/2  --- 


- Ab/2 


B 


(a)  Upper  Sideband  Signaling 

Re(ULSBm) 


A  a/2 
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Ab/2 

SB(f)) 
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B 

fc 

(b)  Lower  Sideband  Signaling 


Figure  3.11:  Spectra  for  SSB  signaling  for  the  example  message  in  Figure  3.3. 
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Figure  3.12:  Complex  envelope  for  the  USB  signal  in  Figure  3.11(a) 
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I  component 


Q  component 


lm(Us  (f)) 


Figure  3.13:  I  and  Q  components  for  the  USB  signal  in  Figure  3.11(a). 


Thus,  up  to  scaling,  the  I  component  Udf)  =  M{f),  and  the  Q  component  is  a  transformation 
of  the  message  given  by 

=  /  Jo  =*W(-Jsgn(/))  (3.14) 

That  is,  the  Q  component  is  a  hltered  version  of  the  message,  where  the  hlter  transfer  function 
is  H{f)  =  — jsgn(/).  This  transformation  is  given  a  special  name,  the  Hilbert  transform. 

Hilbert  transform:  The  Hilbert  transform  of  a  signal  x{t)  is  denoted  by  x{t),  and  is  specihed 
in  the  frequency  domain  as 

This  corresponds  to  passing  u  through  a  hlter  with  transfer  function 

H{f)  =  -jsgn(/)  h{t)  =  — 

TTt 

where  the  derivation  of  the  impulse  response  is  left  as  an  exercise. 

Figure  3.14  shows  the  spectrum  of  the  Hilbert  transform  of  the  example  message  in  Figure  3.3. 
We  see  that  it  is  the  same  (upto  scaling)  as  the  Q  component  of  the  USB  signal,  shown  in  Figure 
3.13. 

Physical  interpretation  of  the  Hilbert  transform:  If  x{t)  is  real-valued,  then  so  is  its 
Hilbert  transform  x{t).  Thus,  the  Fourier  transforms  X{f)  and  X{f)  must  both  satisfy  conjugate 
symmetry,  and  we  only  need  to  discuss  what  happens  at  positive  frequencies.  For  /  >  0,  we  have 
X{f)  =  —  jsgn(/)X(/)  =  —jX{f)  =  e~^^^'^X{f).  That  is,  the  Hilbert  transform  simply  imposes 
a  7r/2  phase  lag  at  all  (positive)  frequencies,  leaving  the  magnitude  of  the  Fourier  transform 
unchanged. 

Example  3.2.2  (Hilbert  transform  of  a  sinusoid):  Based  on  the  preceding  argument,  a 
sinusoid  s{t)  =  cos(27r/of  -|-  (f)  has  Hilbert  transform  s{t)  =  cos(27r/of  +  (j)  —  ^)  =  sin(27r/ot  -|-  </>). 
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Re(M(f)) 


Figure  3.14:  Spectrum  of  the  Hilbert  transform  of  the  example  message  in  Figure  3.3. 


We  can  also  do  this  the  hard  way,  as  follows: 

s{t)  =  COs(27r/of  +  0)  =  I  -g  g-i(27r/oi+<^)^ 

«  S(J)  =  i  {eifstj  -  /„)  +  e-i*S(/  +  /„)) 

Thus, 

S{f)  =  -jsgn{f)S{f)  =  I  -  /o)  +  +  /o)) 

s{t)  =  i 

which  simplifies  to 

s{t)  =  —  =  sin(27r/ot  +  </>) 

Equation  (3.14)  shows  that  the  Q  component  of  the  USB  signal  is  m(t),  the  Hilbert  transform 
of  the  message.  Thus,  the  passband  USB  signal  can  be  written  as 

uussit)  =  'm{t)  cos(27r/ct)  -  m(t)  sin(27r/ct)  (3.15) 

Similarly,  we  can  show  that  the  Q  component  of  an  LSB  signal  is  so  that  the  passband 

LSB  signal  is  given  by 


ULSsit)  =  m{t)  cos(27r/ct)  +  m(t)  sin(27r/ct)  (3.16) 

SSB  modulation:  Conceptually,  an  SSB  signal  can  be  generated  by  filtering  out  one  of  the 
sidebands  of  a  DSB-SC  signal.  However,  it  is  difficult  to  implement  the  required  sharp  cutoff 
at  fc,  especially  if  we  wish  to  preserve  the  information  contained  at  the  boundary  of  the  two 
sidebands,  which  corresponds  to  the  message  information  near  DC.  Thus,  an  implementation  of 
SSB  based  on  sharp  bandpass  filters  runs  into  trouble  when  the  message  has  signihcant  frequency 
content  near  DC.  The  representations  in  (3.15)  and  (3.16)  provide  an  alternative  approach  to 
generating  SSB  signals,  as  shown  in  Figure  3.15.  We  have  emphasized  the  role  of  90°  phase  lags 
in  generating  the  I  and  Q  components,  as  well  as  the  LO  signals  used  for  upconversion. 

Example  3.2.3  (SSB  waveforms  for  a  sinusoidal  message):  For  a  sinusoidal  message 
m{t)  =  cos27r/mt,  we  have  7h{t)  =  sin27r/mt  from  Example  3.2.2.  Consider  the  DSB  signal 

UDSBit)  =  2  cos  271  fmt  cos  271  fct 
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Figure  3.15:  SSB  modulation  using  the  Hilbert  transform  of  the  message. 
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Figure  3.16:  DSB  and  SSB  spectra  for  a  sinusoidal  message. 
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where  we  have  normalized  the  signal  power  to  one:  u^dsb  =  1-  The  DSB,  USB  and  SSB  spectrum 
are  shown  in  Figure  3.16.  From  the  SSB  spectra  shown,  we  can  immediately  write  down  the 
following  time  domain  expressions: 

uusB{t)  =  cos27r(/c  +  fm)t  =  COS  27r/mt  COS 27r/ct  -  sin  27r/mt sin 27r/ct 

ULSsit)  =  cos27r(/c  -  fm)t  =  COS  271  f^t  COS  271  f^t  +  sin  27r/mt  sin  27r/ct 

The  preceding  equations  are  consistent  with  (3.15)  and  (3.16).  For  both  the  USB  and  LSB 
signals,  the  I  component  equals  the  message:  udt)  =  m(t)  =  cos27r fmt.  The  Q  component 
for  the  USB  signal  is  Us{t)  =  m(t)  =  sin27r/mt,  and  the  Q  component  for  the  LSB  signal  is 
Us{t)  =  —m{t)  =  —  sin27r/mh 

SSB  demodulation:  We  know  now  that  the  message  can  be  recovered  from  an  SSB  signal 
by  extracting  its  I  component  using  a  coherent  demodulator  as  in  Figure  3.5.  The  difficulty  of 
coherent  demodulation  lies  in  the  requirement  for  carrier  synchronization,  and  we  have  discussed 
the  adverse  impact  of  imperfect  synchronization  for  DSB-SC  signals.  We  now  show  that  the 
performance  degradation  is  even  more  signihcant  for  SSB  signals.  Consider  a  USB  received 
signal  of  the  form  (ignoring  scale  factors): 

Upit)  =  m{t)  cos(27r/ct  +  Or)  —  m{t)  sin(27r/ct  +  Or)  (3-17) 

where  Or  is  the  phase  offset  with  respect  to  the  receiver  LO.  The  complex  envelope  with  respect 
to  the  receiver  LO  is  given  by 

y{t)  =  (m(t)  +  im{t))  =  {m{t)  +  jm(t))  (cos^^  +jsmOr) 

Taking  the  real  part,  we  obtain  that  the  I  component  extracted  by  the  coherent  demodulator  is 

ydt)  =  m{t)  cosOr  —  rh{t)  sin6'r 

Thus,  as  the  phase  error  Or  increases,  not  only  do  we  get  an  attenuation  in  the  hrst  term  corre¬ 
sponding  to  the  desired  message  (as  in  DSB),  but  we  also  get  interference  due  to  the  second  term 
from  the  Hilbert  transform  of  the  message.  Thus,  for  coherent  demodulation,  accurate  carrier 
synchronization  is  even  more  crucial  for  SSB  than  for  DSB. 

Noncoherent  demodulation  is  also  possible  for  SSB  if  we  add  a  strong  carrier  term,  as  in  conven¬ 
tional  AM.  Specihcally,  for  a  received  signal  given  by 

yp{t)  =  (A  -|-  m{t))  cos(27r/cf  -|-  Or)  ±  m{t)  sin(27r/ct  -|-  Or) 

the  envelope  is  given  by 


e(t)  =  ~  ^ -1- (3.18) 

if  \A  -I-  m{t)\  3>  \m{t)\.  Subject  to  the  approximation  in  (3.18),  an  envelope  detector  works  just 
as  in  conventional  AM. 


3.2.4  Vestigial  Sideband  (VSB)  Modulation 

VSB  is  similar  to  SSB,  in  that  it  also  tries  to  reduce  the  transmitted  bandwidth  relative  to  DSB, 
and  the  transmitted  signal  is  a  hltered  version  of  the  DSB  signal.  The  idea  is  to  mainly  transmit 
one  of  the  two  sidebands,  but  to  leave  a  vestige  of  the  other  sideband  in  order  to  ease  the  hltering 
requirements.  The  passband  hlter  used  to  shape  the  DSB  signal  in  this  fashion  is  chosen  so  that 
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Hp  (f-fc  )+Hp  (f+  fc)  constant  over  message  band 


Figure  3.17:  Relevant  passband  and  baseband  spectra  for  VSB. 


the  I  component  of  the  transmitted  signal  equals  the  message.  To  see  this,  consider  the  DSB-SC 
signal 

2m{t)  cos2n fct  O  M{f  -  /J  +  M{f  +  f^) 

This  is  hltered  by  a  passband  VSB  hlter  with  transfer  function  Hp{f),  as  shown  in  Figure  3.17, 
to  obtain  the  transmitted  signal  with  spectrum 

UvsBif)  =  Hpif)  {M{f  -  /,)  +  M(/  +  /,))  (3.19) 

A  coherent  demodulator  extracting  the  I  component  passes  2uvsB(t)  cos  2%  fct  through  a  lowpass 
hlter.  But 

‘^uvsBit)  cos27r/ct  f-)-  UvsB^f  —  fc)  +  UvSB^f  +  fc) 
which  equals  (substituting  from  (3.19), 

Hpif  -  fc)  (M(/  -  2/,)  +  M(/))  +  Hpif  +  Q  mf)  +  M{f  +  2Q)  (3.20) 

The  2/c  term,  Hp(f  —  fc)M{f  —  2ff)  +  Hp{f  +  /c)M(/  +  2/^),  is  hltered  out  by  the  lowpass  hlter. 
The  output  of  the  LPF  are  the  lowpass  terms  in  (3.20),  which  equal  the  I  component,  and  are 
given  by 

M(/)  (Hpif  -  f,)  +  Hpif  +  f,)) 

In  order  for  this  to  equal  (a  scaled  version  of)  the  desired  message,  we  must  have 

Hpif  +  fc)  +  Hp{f  -  fc)  =  constant  ,  |/|  <  IF  (3.21) 

as  shown  in  the  example  in  Figure  3.17.  To  understand  what  this  implies  about  the  structure 
of  the  passband  VSB  hlter,  note  that  the  hlter  impulse  response  can  be  written  as  hp{t)  = 
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hc(t)  cos  2% fct  —  hs{t)  sin  271  fct,  where  hdt)  is  obtained  by  passing  2hp(t)  cos{27i fct)  through  a 
lowpass  hlter.  But  2hp(t)  cos(27r/ct)  -H-  Hp{f  —  fc)  +  Hp{f  +  fc).  Thus,  the  Fourier  transform 
involved  in  (3.21)  is  precisely  the  lowpass  restriction  of  2hp{t)  cos{27i f^t),  i.e.,  it  is  Thus, 

the  correct  demodulation  condition  for  VSB  in  (3.21)  is  equivalent  to  requiring  that  Hc{f)  be 
constant  over  the  message  band.  Further  discussion  of  the  structure  of  VSB  signals  is  provided 
via  problems. 

As  with  SSB,  if  we  add  a  strong  carrier  component  to  the  VSB  signal,  we  can  demodulate  it 
noncoherently  using  an  envelope  detector,  again  at  the  cost  of  some  distortion  from  the  presence 
of  the  Q  component. 


3.2.5  Quadrature  Amplitude  Modulation 

The  transmitted  signal  in  quadrature  amplitude  modulation  (QAM)  is  of  the  form 

UQAM{t)  =  rnc{t)  cos27r/cf  -  ms{t)  sin27r/cf 

where  rndt)  and  ms(t)  are  separate  messages  (unlike  SSB  and  VSB,  where  the  Q  component  is 
a  transformation  of  the  message  carried  by  the  I  component).  In  other  words,  a  complex- valued 
message  m  =  rndt)  +  jms{t)  is  encoded  in  the  complex  envelope  of  the  passband  transmitted 
signal.  QAM  is  extensively  employed  in  digital  communication,  as  we  shall  see  in  later  chapters. 
It  is  also  used  to  carry  color  information  in  analog  TV. 
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Figure  3.18:  Demodulation  for  quadrature  amplitude  modulation. 


Demodulation  is  achieved  using  a  coherent  receiver  which  extracts  both  the  I  and  Q  components, 
as  shown  in  Figure  3.18.  If  the  received  signal  has  a  phase  offset  6  relative  to  the  receiver’s  LO, 
then  we  get  both  attenuation  in  the  desired  message  and  interference  from  the  undesired  message, 
as  follows.  Ignoring  noise  and  scale  factors,  the  reconstructed  complex  baseband  message  is  given 
by 

m{t)  =  rhdt)  +  jrhsit)  =  (rndt)  +  jms{t))e^^^^^  =  m(t)e-’®b) 

from  which  we  conclude  that 


rhdt)  =  rndt)  cos  9(t)  —  ms(t)  sin  9(t) 
rhs(t)  =  ms(t)  cos  9(t)  -|-  rndt)  sin  9(t) 

Thus,  accurate  carrier  synchronization  {9{t)  as  close  to  zero  as  possible)  is  important  for  QAM 
demodulation  to  function  properly. 
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Figure  3.19:  Spectrum  of  message  and  the  corresponding  AM  signal  in  Example  3.2.4.  Axes  are 
not  to  scale. 
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Figure  3.20:  Passband  output  of  bandpass  filter  and  its  complex  envelope  with  respect  to  600 
KHz  reference,  for  Example  3.2.4.  Axes  are  not  to  scale. 


3.2.6  Concept  synthesis  for  AM 

Here  is  a  worked  problem  that  synthesizes  a  few  of  the  concepts  we  have  discussed  for  AM. 

Example  3.2.4  The  signal  m{t)  =  2cos207r/:  —  cosdOvr/:,  where  the  unit  of  time  is  millisec¬ 
onds,  is  amplitude  modulated  using  a  carrier  frequency  fc  of  600  KHz.  The  AM  signal  is  given 
by 

x{t)  =  5  cos  271  fct  +  m{t)  cos  27ifct 

(a)  Sketch  the  magnitude  spectrum  of  x.  What  is  its  bandwidth? 

(b)  What  is  the  modulation  index? 

(c)  The  AM  signal  is  passed  through  an  ideal  highpass  hlter  with  cutoff  frequency  595  KHz  (i.e., 
the  hlter  passes  all  frequencies  above  595  KHz,  and  cuts  off  all  frequencies  below  595  KHz).  Find 
an  explicit  time  domain  expression  for  the  Q  component  of  the  hlter  output  with  respect  to  a 
600  KHz  frequency  reference. 

Solution:  (a)  The  message  spectrum  M{f)  =  6{f  —  10)  +  6{f  +  10)  —  |(5(/  —  20)  —  +  20). 

The  spectrum  of  the  AM  signal  is  given  by 

Xif)  =  ^Hf  -  Q  +  \s(f  +  /o)  +  \M(f  -  fc)  +  1a/(/  +  fc) 

These  spectra  are  sketched  in  Figure  3.19. 

(b)  From  Figure  3.19,  it  is  clear  that  a  highpass  hlter  with  cutoh  at  595  KHz  selects  the  USB 
signal  plus  the  carrier.  The  passband  output  has  spectrum  as  shown  in  Figure  3.20(a),  and  the 
complex  envelope  with  respect  to  600  KHz  is  shown  in  Figure  3.20(b).  Taking  the  inverse  Fourier 
transform,  the  time  domain  complex  envelope  is  given  by 

y(t)  =  5  + 
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We  can  now  find  the  Q  component  to  be 

ys{t)  =  Im(y(t))  =  sin207rt  —  -sindOvri 

where  t  is  in  milliseconds.  Another  approach  is  to  recognize  that  the  Q  component  is  the  Q 
component  of  the  USB  signal,  which  is  known  to  be  the  Hilbert  transform  of  the  message. 
Yet  another  approach  is  to  hnd  the  Q  component  in  the  frequency  domain  using  jY.(/)  = 

^Y(/)  —  Y*(/)^  /2  and  then  take  inverse  Fourier  transform.  In  this  particular  example,  the  hrst 
approach  is  probably  the  simplest. 


3.3  Angle  Modulation 


We  know  that  a  passband  signal  can  be  represented  as  e(t)  cos(27r/cf  +  9(t)),  where  e(t)  is  the 
envelope,  and  6{t)  is  the  phase.  Let  us  dehne  the  instantaneous  frequency  offset  relative  to  the 
carrier  as 

1 

27r  dt 


In  frequency  modulation  (FM)  and  phase  modulation  (PM),  we  encode  information  into  the 
phase  6{t),  with  the  envelope  remaining  constant.  The  transmitted  signal  is  given  by 


u{t)  =  AcCos{2'k fct  +  9{t)),  Angle  Modulation  (information  carried  in  9) 
For  a  message  m(f),  we  have 


9(t)  =  kpm{t)  ,  Phase  Modulation, 


and 


1  d9{t) 


=  f{t)  =  kfm{t)  ,  Frequency  Modulation, 


27r  dt 

where  /Cp,  kj  are  constants.  Integrating  (3.23),  the  phase  of  the  FM  waveform  is  given  by: 


(3.22) 

(3.23) 


9{t)  =  0(0)  +  2'Kkf  /  m{T)dT 


(3.24) 


Comparing  (3.24)  with  (3.22),  we  see  that  FM  is  equivalent  to  PM  with  the  integral  of  the 
message.  Similarly,  for  differentiable  messages,  PM  can  be  interpreted  as  FM,  with  the  input 
to  the  FM  modulator  being  the  derivative  of  the  message.  Figure  3.21  provides  an  example 
illustrating  this  relationship;  this  is  actually  a  digital  modulation  scheme  called  continuous  phase 
modulation,  as  we  shall  see  when  we  study  digital  communication.  In  this  example,  the  digital 
message  +1,— 1,— 1,+1  is  the  input  to  an  FM  modulator:  the  instantaneous  frequency  switches 
from  fc  +  kf  (for  one  time  unit)  to  fc  —  kj  (for  two  time  units)  and  then  back  to  fc  +  kj  again. 
The  same  waveform  is  produced  when  we  feed  the  integral  of  the  message  into  a  PM  modulator, 
as  shown  in  the  hgure. 

When  the  digital  message  of  Figure  3.21  is  input  to  a  phase  modulator,  then  we  get  a  modulated 
waveform  with  phase  discontinuities  when  the  message  changes  sign.  This  is  in  contrast  to  the 
output  in  Figure  3.21,  where  the  phase  is  continuous.  That  is,  if  we  compare  FM  and  PM 
for  the  same  message,  we  infer  that  FM  waveforms  should  have  less  abrupt  phase  transitions 
due  to  the  smoothing  resulting  from  integration:  compare  the  expressions  for  the  phases  of  the 
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+1 


+1 


(a)  Messages  used  for  angle  modulation 


(b)  Angle  modulated  signal 


Figure  3.21:  The  equivalence  of  FM  and  PM 


-1  -1 


(a)  Digital  input  to  phase  modu-  (b)  Phase  shift  keyed  signal 

lator 

Figure  3.22:  Phase  discontinuities  in  PM  signal  due  to  sharp  message  transitions. 
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modulated  signals  in  (3.22)  and  (3.24)  for  the  same  message  m{t).  Thus,  for  a  given  level  of 
message  variations,  we  expect  FM  to  have  smaller  bandwidth.  FM  is  therefore  preferred  to 
PM  for  analog  modulation,  where  the  communication  system  designer  does  not  have  control 
over  the  properties  of  the  message  signal  (e.g.,  the  system  designer  cannot  require  the  message 
to  be  smooth).  For  this  reason,  and  also  given  the  basic  equivalence  of  the  two  formats,  we 
restrict  the  discussion  in  the  remainder  of  this  section  to  FM  for  the  most  part.  PM,  however, 
is  extensively  employed  in  digital  communication,  where  the  system  designer  has  signihcant 
flexibility  in  shaping  the  message  signal.  In  this  context,  we  use  the  term  Phase  Shift  Keying 
(PSK)  to  denote  the  discrete  nature  of  the  information  encoded  in  the  message.  Figure  3.22  is 
actually  a  simple  example  of  PSK,  although  in  practice,  the  phase  of  the  modulated  signal  is 
shaped  to  be  smoother  in  order  to  improve  bandwidth  efficiency. 

Frequency  Deviation  and  Modulation  Index:  The  maximum  deviation  in  instantaneous 
frequency  due  to  a  message  m(t)  is  given  by 

^fmax  =  kfmaxt\m{t)\ 

If  the  bandwidth  of  the  message  is  B,  the  modulation  index  is  dehned  as 

_  Afmax  _  kfmaxt\m(t)\ 

B  ~  B 

We  use  the  term  narrowband  FM  if  /3  <  1  (typically  much  smaller  than  one),  and  the  term 
wideband  FM  if  /3  >  1.  We  discuss  the  bandwidth  occupancy  of  FM  signals  in  more  detail  a  little 
later,  but  note  for  now  that  the  bandwidth  of  narrowband  FM  signals  is  dominated  by  that  of  the 
message,  while  the  bandwidth  of  wideband  FM  signals  is  dominated  by  the  frequency  deviation. 

Consider  the  FM  signal  corresponding  to  a  sinusoidal  message  m(t)  =  Am  cos2Tr fmt.  The  phase 
deviation  due  to  this  message  is  given  by 

e{t)  =  27rkf  [  Am  cos{27rf mT)  dr  =  sin(27r/^t) 

Jo  Jm 

Since  the  maximum  frequency  deviation  Afmax  =  Amkf  and  the  message  bandwidth  B  =  fm, 
the  modulation  index  is  given  by  /3  =  ,  so  that  the  phase  deviation  can  be  written  as 

9{t)  =  (3  sm27r  fmt  (3.25) 

Modulation:  An  FM  modulator,  by  dehnition,  is  a  Voltage  Controlled  Oscillator  (VCO),  whose 
output  is  a  sinusoidal  wave  whose  instantaneous  frequency  offset  from  a  reference  frequency  is 
proportional  to  the  input  signal.  VCO  implementations  are  often  based  on  the  use  of  varactor 
diodes,  which  provide  voltage-controlled  capacitance,  in  LC  tuned  circuits.  This  is  termed  direct 
FM  modulation,  in  that  the  output  of  the  VCO  produces  a  passband  signal  with  the  desired 
frequency  deviation  as  a  function  of  the  message.  The  VCO  output  may  be  at  the  desired  carrier 
frequency,  or  at  an  intermediate  frequency.  In  the  latter  scenario,  it  must  be  upconverted  further 
to  the  carrier  frequency,  but  this  operation  does  not  change  the  frequency  modulation.  Direct 
FM  modulation  may  be  employed  for  both  narrowband  and  wideband  modulation. 

An  alternative  approach  to  wideband  modulation  is  to  hrst  generate  a  narrowband  FM  signal 
(typically  using  a  phase  modulator),  and  to  then  multiply  the  frequency  (often  over  multiple 
stages)  using  nonlinearities,  thus  increasing  the  frequency  deviation  as  well  as  the  carrier  fre¬ 
quency.  This  method,  which  is  termed  indirect  FM  modulation,  is  of  historical  importance,  but  is 
not  used  in  present-day  FM  systems  because  direct  modulation  for  wideband  FM  is  now  feasible 
and  cost-effective. 

Demodulation:  Many  different  approaches  to  FM  demodulation  have  evolved  over  the  past  cen¬ 
tury.  Here  we  discuss  two  important  classes  of  demodulators:  limiter-discriminator  demodulator 
in  Section  3.3.1,  and  the  phase  locked  loop  in  Section  3.5. 
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3.3.1 


Limiter- Discriminator  Demodulation 


Limiter 


A  cos(27ifc  t  +0(t)) 


Figure  3.23:  Limiter-Discriminator  Demodulation  of  FM. 


The  task  of  an  FM  demodulator  is  to  convert  frequency  variations  in  the  passband  received 
signal  into  amplitude  variations,  thus  recovering  an  estimate  of  the  message.  Ideally,  therefore, 
an  FM  demodulator  would  produce  the  derivative  of  the  phase  of  the  received  signal;  this  is 
termed  a  discriminator,  as  shown  in  Figure  3.23.  While  an  ideal  FM  signal  as  in  (3.26)  does  not 
have  amplitude  fluctuations,  noise  and  channel  distortions  might  create  such  fluctuations,  which 
leads  to  unwanted  contributions  to  the  discriminator  output.  In  practice,  therefore,  as  shown 
in  the  hgure,  the  discriminator  is  typically  preceded  by  a  limiter,  which  removes  amplitude 
fluctuations  due  to  noise  and  channel  distortions  which  might  lead  to  unwanted  contributions 
to  the  discriminator  output.  This  is  achieved  by  passing  the  modulated  sinusoidal  waveform 
through  a  hardlimiter,  which  generates  a  square  wave,  and  then  selecting  the  right  harmonic 
using  a  bandpass  hlter  tuned  to  the  carrier  frequency.  The  overall  structure  is  termed  a  limiter- 
discriminator. 

Ideal  limiter-discriminator:  Following  the  limiter,  we  have  an  FM  signal  of  the  form: 

yp{t)  =  Acos{27r  fct  +  9{t)) 

where  6{t)  may  include  contributions  due  to  channel  and  noise  impairments  (to  be  discussed 
later),  as  well  as  the  angle  modulation  due  to  the  message.  An  ideal  discriminator  now  produces 
the  output  (where  we  ignore  scaling  factors). 


A  cos(27ifc  t  +0(t)) 
(from  limiter) 


d0(t)/dt 


2  jcfc  +  d0(t)/dt 


Figure  3.24:  A  crude  discriminator  based  on  differentiation  and  envelope  detection. 


A  crude  realization  of  a  discriminator,  which  converts  fluctuations  in  frequency  to  fluctuations 
in  envelope,  is  shown  in  Figure  3.24.  Taking  the  derivative  of  the  FM  signal 


upAiit)  =  AcCos  i  2TTfct  +  2Trkf  /  m{T)dT  +  9o 


(3.26) 


we  have 

v{t)  =  ^  {271  fc  +  27rkfm{t))  sin  (271  fct  2Tikf  f  m{T)dT 9^ 

dt  \  Jo 

The  envelope  of  v{t)  is  27r Ad fc  +  Noting  that  kfm{t)  is  the  instantaneous  frequency 

deviation  from  the  carrier,  whose  magnitude  is  much  smaller  than  for  a  properly  designed 
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system,  we  realize  that  fc  +  kfm(t)  >  0  for  all  t.  Thus,  the  envelope  equals  27iAc{fc  +  kfm(t)), 
so  that  passing  the  discriminator  output  through  an  envelope  detector  yields  a  scaled  and  DC- 
shifted  version  of  the  message.  Using  AC  coupling  to  reject  the  DC  term,  we  obtain  a  scaled 
version  of  the  message  m(t),  just  as  in  conventional  AM. 


FM  Signal 


Figure  3.25:  Slope  detector  using  a  tuned  circuit  offset  from  resonance. 

The  discriminator  as  described  above  corresponds  to  the  frequency  domain  transfer  function 
H{f)  =  j27if,  and  can  therefore  be  approximated  (up  to  DC  offsets)  by  transfer  functions  that 
are  approximately  linear  over  the  FM  band  of  interest.  An  example  of  such  a  slope  detector  is 
given  in  Figure  3.25,  where  the  carrier  frequency  fc  is  chosen  at  an  offset  from  the  resonance 
frequency  /o  of  a  tuned  circuit. 

One  problem  with  the  simple  discriminator  and  its  approximations  is  that  the  envelope  detector 
output  has  a  signihcant  DC  component:  when  we  get  rid  of  this  using  AC  coupling,  we  also 
attenuate  low  frequency  components  near  DC.  This  limitation  can  be  overcome  by  employing 
circuits  that  rely  on  the  approximately  linear  variations  in  amplitude  and  phase  of  tuned  circuits 
around  resonance  to  synthesize  approximations  to  an  ideal  discriminator  whose  output  is  the 
derivative  of  the  phase.  These  include  the  Foster-Seely  detector  and  the  ratio  detector.  Circuit 
level  details  of  such  implementations  are  beyond  our  scope. 


3.3.2  FM  Spectrum 

We  hrst  consider  a  naive  but  useful  estimate  of  FM  bandwidth  termed  Carson’s  rule.  We 
then  show  that  the  spectral  properties  of  FM  are  actually  quite  complicated,  even  for  a  simple 
sinusoidal  message,  and  outline  methods  of  obtaining  more  detailed  bandwidth  estimates. 

Consider  an  angle  modulated  signal,  Up{t)  =  Ac  cos  (27r/ct -|- 6'(t)),  where  6{t)  contains  the  mes¬ 
sage  information.  For  a  baseband  message  m{t)  of  bandwidth  B,  the  phase  6{t)  for  PM  is  also 
a  baseband  signal  with  the  same  bandwidth.  The  phase  6{t)  for  FM  is  the  integral  of  the  mes¬ 
sage.  Since  integration  smooths  out  the  time  domain  signal,  or  equivalently,  attenuates  higher 
frequencies,  6{t)  is  a  baseband  signal  with  bandwidth  at  most  B.  We  therefore  loosely  think  of 
6{t)  as  having  a  bandwidth  equal  to  B,  the  message  bandwidth,  for  the  remainder  of  this  section. 

The  complex  envelope  of  Up  with  respect  to  fc  is  given  by 

u{t)  =  AcC-^®*'*^  =  AcCos6{t)  +  jAcSm6{t) 

Now,  if  \0{t)\  is  small,  as  is  the  case  for  narrowband  angle  modulation,  then  cos  6{t)  ~  1  and 
sin6*(f)  d{t),  so  that  the  complex  envelope  is  approximately  given  by 

u{t)  ^  Ac  +  jAc9{t) 
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Thus,  the  passband  signal  is  approximately  given  by 

Up{t)  ~  AcCos27rfct  —  6  (t)  Ac  sin  271  fct 

Thus,  the  I  component  has  a  large  unmodulated  carrier  contribution  as  in  conventional  AM,  but 
the  message  information  is  now  in  the  Q  component  instead  of  in  the  I  component,  as  in  AM. 
The  Fourier  transform  is  given  by 

UM)  =  Y  w  -  /-)  +  ^  w  -  «  - 

where  0(/)  denotes  the  Fourier  transform  of  0{t).  The  magnitude  spectrum  is  therefore  given 
by 

\u,{f  )\  =  Y  W  -  fc)  +  «(/  +  /J)  +  Y  + 1®!-''  + 

Thus,  the  bandwidth  of  a  narrowband  FM  signal  is  25,  or  twice  the  message  bandwidth,  just  as 
in  AM.  For  example,  narrowband  angle  modulation  with  a  sinusoidal  message  m{t)  =  cos  271  f^t 

occupies  a  bandwidth  of  2/^:  0{t)  =  |^sin27r/mt  for  FM,  and  9{t)  =  kpCos27ifmt)  for  PM. 

For  wideband  FM,  we  would  expect  the  bandwidth  to  be  dominated  by  the  frequency  deviation 
kfm{t).  For  messages  that  have  positive  and  negative  peaks  of  similar  size,  the  frequency  devia¬ 
tion  ranges  between  —Afmax  and  Afmax,  where  Afmax  =  A;/maXi|m(t)|.  In  this  case,  we  expect 
the  bandwidth  to  be  dominated  by  the  instantaneous  deviations  around  the  carrier  frequency, 
which  spans  an  interval  of  length  2 Afmax  ■ 

Carson’s  rule:  This  is  an  estimate  for  the  bandwidth  of  a  general  FM  signal,  based  on  simply 
adding  up  the  estimates  from  our  separate  discussion  of  narrowband  and  wideband  modulation: 

Bfm  ~  25  -|-  2Afmax  =  2.B{(3  +  1)  ,  Carson's  rule  (3.28) 

where  (3  =  Afmax/ B  is  the  modulation  index,  also  called  the  FM  deviation  ratio,  dehned  earlier. 

FM  Spectrum  for  a  Sinusoidal  Message:  In  order  to  get  more  detailed  insight  into  what 
the  spectrum  of  an  FM  signal  looks  like,  let  us  now  consider  the  example  of  a  sinusoidal  message, 
for  which  the  phase  deviation  is  given  by  d{t)  =  /5sin27r/mt,  from  (3.25).  The  complex  envelope 
of  the  FM  signal  with  respect  to  fc  is  given  by 

u{t)  = 

Since  the  sinusoid  in  the  exponent  is  periodic  with  period  j-,  so  is  u{t).  It  can  therefore  be 
expanded  into  a  Fourier  series  of  the  form 

CO 

u{t)= 


where  the  Fourier  coefficients  {n[n]}  are  given  by 


u[n]  =  fm  =  U  ^jdsin2nUt^-j2nnf^t^^ 

Using  the  change  of  variables  271  fmt  =  x,  we  have 
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where  Jn{-)  is  the  Bessel  function  of  the  hrst  kind  of  order  n.  While  the  integrand  above  is 
complex-valued,  the  integral  is  real-valued.  To  see  this,  use  Euler’s  formula: 

^j{l3smx-nx)  ^  cos(/9sina;  —  nx)  +  j  sin(/9sina:  —  nx) 

Since  (3  sin  x  —  nx  and  the  sine  function  are  both  odd,  the  imaginary  term  sin(/9  sin  x  —  nx)  above 
is  an  odd  function,  and  integrates  out  to  zero  over  [— tt,  vr].  The  real  part  is  even,  hence  the 
integral  over  [— tt,  tt]  is  simply  twice  that  over  [0,7r].  We  summarize  as  follows: 

u[n]  =  Jn{l3)  =  —  f  =  —  [  cos{(3  sin  x  —  nx)dx  (3.29) 

271-  J-n  tt  Jq 


Figure  3.26:  Bessel  functions  of  the  hrst  kind,  JniP)  versus  /9,  for  n  =  0, 1,  2,  3. 


Bessel  functions  are  available  in  mathematical  software  packages  such  as  Matlab  and  Mathemat- 
ica.  Figure  3.26  shows  some  Bessel  function  plots.  Some  properties  of  Bessel  functions  worth 
noting  are  as  follows: 

•  For  n  integer,  J„(/3)  ==  (-l)”J_„(/3)  =  (-1)” J„(-/3). 

•  For  hxed  (3,  Jn{f3)  tends  to  zero  fast  as  n  gets  large,  so  that  the  complex  envelope  is  well  ap¬ 
proximated  by  a  hnite  number  of  Fourier  series  components.  In  particular,  a  good  approximation 
is  that  Jn{(3)  is  small  for  n  >  13  +  1-  This  leads  to  an  approximation  for  the  bandwidth  of  the 
FM  signal  given  by  2{(3  -|-  l)/m,  which  is  consistent  with  Carson’s  rule. 

•  For  hxed  n,  Jn{(3)  vanishes  for  specihc  values  of  /9,  a  fact  that  can  be  used  for  spectral  shaping. 


To  summarize,  the  complex  envelope  of  an  FM  signal  modulated  by  a  sinusoidal  message  can  be 
written  as 

CXD 


u{t)  =  = 


E  7«(/3) 


^j2TTnfmt 


(3.30) 


n=—oo 

The  corresponding  spectrum  is  given  by 


CX) 

U{f)=  E  Ums-nfra) 

n=—oo 


(3.31) 
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Noting  that  |  J_n(/3)|  =  |  Jn(/3)|,  the  complex  envelope  has  discrete  frequency  components  at  ±nfm 
of  strength  |  J„(/3)|:  these  correspond  to  frequency  components  at  fc  ±  nfm  in  the  passband  FM 
signal. 

Fractional  power  containment  bandwidth:  By  Parseval’s  identity  for  Fourier  series,  the 
power  of  the  complex  envelope  is  given  by 

C»  OD 

l  =  Wi)r=Ri)F=  E  + 

n=—cio  n=l 

we  can  compute  the  fractional  power  containment  bandwidth  as  2Kfm,  where  iF  >  1  is  the 
smaller  integer  such  that 

K 

4(/3)  +  2E-^«(/3)>“ 

n=l 

where  a  is  the  desired  fraction  of  power  within  the  band,  (e.g.,  a  =  0.99  for  the  99%  power 
containment  bandwidth).  For  integer  values  of  (3  =  1, ...,  10,  we  hnd  that  K  =  (3  +  1  provides  a 
good  approximation  to  the  99%  power  containment  bandwidth,  which  is  again  consistent  with 
Carson’s  formula. 


3.3.3  Concept  synthesis  for  FM 


^  t  (microsec) 


Figure  3.27:  Input  to  the  VCO  in  Example  3.3.1. 


The  following  worked  problem  brings  together  some  of  the  concepts  we  have  discussed  regarding 
FM. 

Example  3.3.1  The  signal  a{t)  shown  in  Figure  3.27  is  fed  to  a  VCO  with  quiescent  frequency 
of  5  MHz  and  frequency  deviation  of  25  KHz/mV.  Denote  the  output  of  the  VCO  by  y{t). 

(a)  Provide  an  estimate  of  the  bandwidth  of  y.  Clearly  state  the  assumptions  that  you  make. 

(b)  The  signal  y{t)  is  passed  through  an  ideal  bandpass  hlter  of  bandwidth  5  KHz,  centered  at 
5.005  MHz.  Provide  the  simplest  possible  expression  for  the  power  at  the  hlter  output  (if  you 
can  give  a  numerical  answer,  do  so). 

Solution:  (a)  The  VCO  output  is  an  FM  signal  with 

32^ f max  =  kfmaxtm(t)  =  25  KHz/mV  x  2  mV  =  50  KHz 

The  message  is  periodic  with  period  100  microseconds,  hence  its  fundamental  frequency  is  10 
KHz.  Approximating  its  bandwidth  by  its  hrst  harmonic,  we  have  i?  10  KHz.  Using  Carson’s 
formula,  we  can  approximate  the  bandwidth  of  the  FM  signal  at  the  VCO  output  as 

Bfm  ~  2A frnax  +  2B  ^  120  KHz 

(b)  The  complex  envelope  of  the  VCO  output  is  given  by  where 

6{t)  =  2'Kkf  /  m{T)dT 
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For  periodic  messages  with  zero  DC  value  (as  is  the  case  for  m{t)  here),  9{t),  and  hence, 
has  the  same  period  as  the  message.  We  can  therefore  express  the  complex  envelope  as  a  Fourier 
series  with  complex  exponentials  at  frequencies  n/^,  where  fm  =  10  KHz  is  the  fundamental 
frequency  for  the  message,  and  where  n  takes  integer  values.  Thus,  the  FM  signal  has  discrete 
components  at  fc  +  nfm,  where  /c  =  5  MHz  in  this  example.  A  bandpass  filter  at  5.005  MHz 
with  bandwidth  5  KHz  does  not  capture  any  of  these  components,  since  it  spans  the  interval 
[5.0025,  5.0075]  MHz,  whereas  the  nearest  Fourier  components  are  at  5  MHz  and  5.01  MHz. 
Thus,  the  power  at  the  output  of  the  bandpass  filter  is  zero. 


3.4  The  Superheterodyne  Receiver 

The  receiver  in  a  radio  communication  system  must  downconvert  the  passband  received  signal 
down  to  baseband  in  order  to  recover  the  message.  At  the  turn  of  the  twentieth  century,  it 
was  difficult  to  produce  amplification  at  frequencies  beyond  a  few  MHz)  using  the  vacuum  tube 
technology  of  that  time.  However,  higher  carrier  frequencies  are  desirable  because  of  the  larger 
available  bandwidths  and  the  smaller  antennas  required.  The  invention  of  the  superheterodyne, 
or  superhet,  receiver  was  motivated  by  these  considerations.  Basically,  the  idea  is  to  use  sloppy 
design  for  front  end  filtering  of  the  received  radio  frequency  (RF)  signal,  and  for  translating  it  to  a 
lower  intermediate  frequency  (IF).  The  IF  signal  is  then  processed  using  carefully  designed  filters 
and  amplifiers.  Subsequently,  the  IF  signal  can  be  converted  to  baseband  in  a  number  of  different 
ways:  for  example,  an  envelope  detector  for  AM  radio,  a  phase  locked  loop  or  discriminator 
for  FM  radio,  and  a  coherent  quadrature  demodulator  for  digital  cellular  telephone  receivers. 
While  the  original  motivation  for  the  superheterodyne  receiver  is  no  longer  strictly  applicable 
(modern  analog  electronics  are  capable  of  providing  amplihcation  at  the  carrier  frequencies  in 
commercial  use),  it  is  still  true  that  gain  is  easier  to  provide  at  lower  frequencies  than  at  higher 
frequencies.  Furthermore,  it  becomes  possible  to  closely  optimize  the  processing  at  a  fixed  IF 
(in  terms  of  amplifier  and  filter  design),  while  permitting  a  tunable  RF  front  end  with  more 
relaxed  specifications,  which  is  important  for  the  design  of  radios  that  operate  over  a  wide 
range  of  carrier  frequencies.  For  example,  the  superhet  architecture  is  commonly  employed  for 
AM  and  FM  broadcast  radio  receivers,  where  the  RF  front  end  tunes  to  the  desired  station, 
translating  the  received  signal  to  a  fixed  IF.  Radio  receivers  built  with  discrete  components  often 
take  advantage  of  the  widespread  availability  of  inexpensive  filters  at  certain  commonly  used 
IF  frequencies,  such  as  455  KHz  (used  for  AM  radio)  and  10.7  MHz  (used  for  FM  radio).  As 
carrier  frequencies  scale  up  to  the  GHz  range  (as  is  the  case  for  modern  digital  cellular  and 
wireless  local  area  network  transceivers),  circuit  components  shrink  with  the  carrier  wavelength, 
and  it  becomes  possible  to  implement  RF  amplifiers  and  filters  using  integrated  circuits.  In  such 
settings,  a  direct  conversion  architecture,  in  which  the  passband  signal  is  directly  translated  to 
baseband,  becomes  increasingly  attractive,  as  discussed  later  in  this  section. 

The  key  element  in  frequency  translation  is  a  mixer,  which  multiplies  two  input  signals.  For  our 
present  purpose,  one  of  these  inputs  is  a  passband  received  signal  AcosiflnfRpt  +  0),  where  the 
envelope  A[t)  and  phase  9(t)  are  baseband  signals  that  contain  message  information.  The  second 
input  is  a  local  oscillator  (LO)  signal,  which  is  a  locally  generated  sinusoid  cosiflnfLot)  (we  set 
the  LO  phase  to  zero  without  loss  of  generality,  effectively  adopting  it  as  our  phase  reference). 
The  output  of  the  mixer  is  therefore  given  by 

A  cos(27r  fjij,t  +  9)  cos{2Tr  f Lot)  =  ^  cos  (27r(/i?F  -  fLo)t  +  9)  +  ^  cos  (27r(/ijF  +  fLo)t  +  9) 

Thus,  there  are  two  frequency  components  at  the  output  of  the  mixer,  fjip  +  fio  and  |/hf  — 
/loI  (remember  that  we  only  need  to  talk  about  positive  frequencies  when  discussing  physically 
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realizable  signals,  due  to  the  conjugate  symmetry  of  the  Fourier  transform  of  real-valued  time 
signals).  In  the  superhet  receiver,  we  set  one  of  these  as  our  IF,  typically  the  difference  frequency: 
fiF  =  IIrf  —  fiol- 


RF  signal 
into  antenna 


Local  Oscillator 


Figure  3.28:  Generic  block  diagram  for  a  superhet  receiver. 
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Figure  3.29:  A  superhet  AM  receiver. 


For  a  given  RF  and  a  hxed  IF,  we  therefore  have  two  choices  of  LO  frequency  when  fjp  = 
IJrf  -  /lo|:  Jlo  =  Irf  -  fiF  and  fio  =  Jrf  +  fiF  To  continue  the  discussion,  let  us  consider 
the  example  of  AM  broadcast  radio,  which  operates  over  the  band  from  540  to  1600  KHz,  with 
10  KHz  spacing  between  the  carrier  frequencies  for  different  stations.  The  audio  message  signal 
is  limited  to  5  KHz  bandwidth,  modulated  using  conventional  AM  to  obtain  an  RF  signal  of 
bandwidth  10  KHz.  Figure  3.29  shows  a  block  diagram  for  the  superhet  architecture  commonly 
used  in  AM  receivers.  The  RF  bandpass  hlter  must  be  tuned  to  the  carrier  frequency  for  the 
desired  station,  and  at  the  same  time,  the  LO  frequency  into  the  mixer  must  be  chosen  so  that 
the  difference  frequency  equals  the  IF  frequency  of  455  KHz.  If  fio  =  Irf  +  fiF,  then  the 
LO  frequency  ranges  from  995  to  2055  KHz,  corresponding  to  an  approximately  2-fold  variation 
in  tuning  range.  If  fio  =  Irf  —  fiF,  then  the  LO  frequency  ranges  from  85  to  1145  KHz, 
corresponding  to  more  than  13-fold  variation  in  tuning  range.  The  hrst  choice  is  therefore 
preferred,  because  it  is  easier  to  implement  a  tunable  oscillator  over  a  smaller  tuning  range. 
Having  hxed  the  LO  frequency,  we  have  a  desired  signal  at  /rf  =  Ilo  —  f if  that  leads  to  a 
component  at  IF,  and  potentially  an  undesired  image  frequency  at  //m  =  fio  +  fiF  =  fRF  +  ‘^fiF 
that  also  leads  to  a  component  at  IF.  The  job  of  the  RF  bandpass  hlter  is  to  block  this  image 
frequency.  Thus,  the  hlter  must  let  in  the  desired  signal  at  Jrf  (so  that  its  bandwidth  must  be 
larger  than  10  KHz),  but  severely  attenuate  the  image  frequency  which  is  910  KHz  away  from 
the  center  frequency.  It  is  therefore  termed  an  image  reject  hlter.  We  see  that,  for  the  AM 
broadcast  radio  application,  a  superhet  architecture  allows  us  to  design  the  tunable  image  reject 
hlter  to  somewhat  relaxed  specihcations.  However,  the  image  reject  hlter  does  let  in  not  only 
the  signal  from  the  desired  station,  but  also  those  from  adjacent  stations.  It  is  the  job  of  the 
IF  hlter,  which  is  tuned  to  the  hxed  frequency  of  455  KHz,  to  hlter  out  these  adjacent  stations. 
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Figure  3.30:  The  role  of  image  rejection  and  channel  selection  in  superhet  receivers. 


For  this  purpose,  we  use  a  highly  selective  hlter  at  IF  with  a  bandwidth  of  10  KHz.  Figure  3.30 
illustrates  these  design  considerations  more  generally. 

Receivers  for  FM  broadcast  radio  also  commonly  use  a  superhet  architecture.  The  FM  broadcast 
band  ranges  from  88-108  MHz,  with  carrier  frequency  separation  of  200  KHz  between  adjacent 
stations.  The  IF  is  chosen  at  10.7  MHz,  so  that  the  LO  is  tuned  from  98.7  to  118.7  MHz  for  the 
choice  fio  =  Irf  +  f if-  The  RF  hlter  specihcations  remain  relaxed:  it  has  to  let  in  the  desired 
signal  of  bandwidth  200  KHz,  while  rejecting  an  image  frequency  which  is  2fip  =  21.4  MHz  away 
from  its  center  frequency.  We  discuss  the  structure  of  the  FM  broadcast  signal,  particularly  the 
way  in  which  stereo  FM  is  transmitted,  in  more  detail  in  Section  3.6. 

Roughly  indexing  the  difficulty  of  implementing  a  hlter  by  the  ratio  of  its  center  frequency  to  its 
bandwidth,  or  its  Q  factor,  with  high  Q  being  more  difficult  to  implement,  we  have  the  following 
fundamental  tradeoh  for  superhet  receivers.  If  we  use  a  large  IF,  then  the  Q  needed  for  the 
image  reject  hlter  is  smaller.  On  the  other  hand,  the  Q  needed  for  the  IF  hlter  to  reject  an 
interfering  signal  whose  frequency  is  near  that  of  the  desired  signal  becomes  higher.  In  modern 
digital  communication  applications,  superheterodyne  reception  with  multiple  IF  stages  may  be 
used  in  order  to  work  around  this  tradeoh,  in  order  to  achieve  the  desired  gain  for  the  signal  of 
interest  and  to  attenuate  sufficiently  interference  from  other  signals,  while  achieving  an  adequate 
degree  of  image  rejection.  Image  rejection  can  be  enhanced  by  employing  appropriately  designed 
image-reject  mixer  architectures. 

Direct  conversion  receivers:  With  the  trend  towards  increasing  monolithic  integration  of 
digital  communication  transceivers  for  applications  such  as  cellular  telephony  and  wireless  local 
area  networks,  the  superhet  architecture  is  often  being  supplanted  by  direct  conversion  (or  zero 
IF)  receivers,  in  which  the  passband  received  signal  is  directly  converted  down  to  baseband 
using  a  quadrature  mixer  at  the  RF  carrier  frequency.  In  this  case,  the  desired  signal  is  its 
own  image,  which  removes  the  necessity  for  image  rejection.  Moreover,  interfering  signals  can  be 
hltered  out  at  baseband,  often  using  sophisticated  digital  signal  processing  after  analog-to-digital 
conversion  (ADC),  provided  that  there  is  enough  dynamic  range  in  the  circuitry  to  prevent  a 
strong  interferer  from  swamping  the  desired  signal  prior  to  the  ADC.  In  contrast,  the  high  Q 
bandpass  hlters  required  for  image  rejection  and  interference  suppression  in  the  superhet  design 
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must  often  be  implemented  off-chip  using,  for  example,  surface  acoustic  wave  (SAW)  devices, 
which  is  bulky  and  costly.  Thus,  direct  conversion  is  in  some  sense  the  “obvious”  thing  to  do, 
except  that  historically,  people  were  unable  to  make  it  work,  leading  to  the  superhet  architecture 
serving  as  the  default  design  through  most  of  the  twentieth  century.  A  key  problem  with  direct 
conversion  is  that  LO  leakage  into  the  RF  input  of  the  mixer  causes  self-mixing,  leading  to  a  DC 
offset.  While  a  DC  offset  can  be  calibrated  out,  the  main  problem  is  that  it  can  saturate  the 
amplihers  following  the  mixer,  thus  swamping  out  the  contribution  of  the  weaker  received  signal. 
Note  that  the  DC  offset  due  to  LO  leakage  is  not  a  problem  with  a  superhet  architecture,  since 
the  DC  term  gets  hltered  out  by  the  passband  IF  hlter.  Other  problems  with  direct  conversion 
include  1//  noise  and  susceptibility  to  second  order  nonlinearities,  but  discussion  of  these  issues 
is  beyond  our  current  scope.  However,  since  the  1990s,  integrated  circuit  designers  have  managed 
to  overcome  these  and  other  obstacles,  and  direct  conversion  receivers  have  become  the  norm 
for  monolithic  implementations  of  modern  digital  communication  transceivers.  These  include 
cellular  systems  in  various  licensed  bands  ranging  from  900  MHz  to  2  GHz,  and  WLANs  in  the 
2.4  GHz  and  5  GHz  unlicensed  bands. 

The  insatiable  demand  for  communication  bandwidth  virtually  assures  us  that  we  will  seek  to 
exploit  frequency  bands  well  beyond  5  GHz,  and  circuit  designers  will  be  making  informed  choices 
between  the  superhet  and  direct  conversion  architectures  for  radios  at  these  higher  frequencies. 
For  example,  the  60  GHz  band  in  the  United  States  has  7  GHz  of  unlicensed  spectrum;  this 
band  is  susceptible  to  oxygen  absorption,  and  is  ideally  suited  for  short  range  (e.g.  10-500 

meters  range)  communication  both  indoors  and  outdoors.  Similarly,  the  71-76  GHz  and  81- 
86  GHz  bands,  which  avoid  oxygen  absorption  loss,  are  available  for  semi-unlicensed  point-to- 
point  “last  mile”  links.  Just  as  with  cellular  and  WLAN  applications  in  lower  frequency  bands, 
we  expect  that  proliferation  of  applications  using  these  “millimeter  (mm)  wave”  bands  would 
require  low-cost  integrated  circuit  transceiver  implementations.  Based  on  the  trends  at  lower 
frequencies,  one  is  tempted  to  conjecture  that  initial  circuit  designs  might  be  based  on  superhet 
architectures,  with  direct  conversion  receivers  becoming  subsequently  more  popular  as  designers 
become  more  comfortable  with  working  at  these  higher  frequencies.  It  is  interesting  to  note 
that  the  design  experience  at  lower  carrier  frequencies  does  not  go  to  waste;  for  example,  direct 
conversion  receivers  at,  say,  5  GHz,  can  serve  as  the  IF  stage  for  superhet  receivers  for  mm  wave 
communication. 


3.5  The  Phase  Locked  Loop 


The  phase  locked  loop  (PLL)  is  an  effective  FM  demodulator,  but  also  has  a  far  broader  range 
of  applications,  including  frequency  synthesis  and  synchronization.  We  therefore  treat  it  sepa¬ 
rately  from  our  coverage  of  FM.  The  PLL  provides  a  canonical  example  of  the  use  of  feedback 
for  estimation  and  synchronization  in  communication  systems,  a  principle  that  is  employed  in 
variants  such  as  the  Costas’  loop  and  the  delay  locked  loop. 

The  key  idea  behind  the  PLL,  depicted  in  Figure  3.31,  is  as  follows:  we  would  like  to  lock  on  to 
the  phase  of  the  input  to  the  PLL.  We  compare  the  phase  of  the  input  with  that  of  the  output  of 
a  voltage  controlled  oscillator  (VCO)  using  a  phase  detector.  The  difference  between  the  phases 
drives  the  input  of  the  VCO.  If  the  VCO  output  is  ahead  of  the  PLL  input  in  phase,  then  we 
would  like  to  retard  the  VCO  output  phase.  If  the  VCO  output  is  behind  the  PLL  input  in  phase, 
we  would  like  to  advance  the  VCO  output  phase.  This  is  done  by  using  the  phase  difference  to 
control  the  VCO  input.  Typically,  rather  than  using  the  output  of  the  phase  detector  directly 
for  this  purpose,  we  smooth  it  out  using  a  loop  hlter  in  order  to  reduce  the  effect  of  noise. 

Mixer  as  phase  detector:  The  classical  analog  realization  of  the  PLL  is  based  on  using  a 
mixer  (i.e.,  a  multiplier)  as  a  phase  detector.  To  see  how  this  works,  consider  the  product  of  two 
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Function  of  phase  difference  9,—  9 


Figure  3.31;  PLL  block  diagram. 


1/2  Aj,  Ay  sin  (0,-0o ) 


Figure  3.32:  PLL  realization  using  a  mixer  as  phase  detector. 


sinusoids  whose  phases  we  are  trying  to  align: 

cos(27r/ct  +  9i)  cos(27r/ct  +  6*2)  =  ^  cos(6'i  -  ^2)  +  ^  cos(47r/ct  +  9i  +  ^2)) 

The  second  term  on  the  right-hand  side  is  a  passband  signal  at  2/c  which  can  be  hltered  out 
by  a  lowpass  hlter.  The  first  term  contains  the  phase  difference  6*1  —  6*2,  and  is  to  be  used  to 
drive  the  VCO  so  that  we  eventually  match  the  phases.  Thus,  the  first  term  should  be  small 
when  we  are  near  a  phase  match.  Since  the  driving  term  is  the  cosine  of  the  phase  difference, 
the  phase  match  condition  is  —  ^2  =  7r/2.  That  is,  using  a  mixer  as  our  phase  detector  means 
that,  when  the  PLL  is  locked,  the  phase  at  the  VCO  output  is  90°  offset  from  the  phase  of  the 
PLL  input.  Now  that  we  know  this,  we  adopt  a  more  convenient  notation,  changing  variables 
to  dehne  a  phase  difference  whose  value  at  the  desired  matched  state  is  zero  rather  than  7r/2. 
Let  the  PLL  input  be  denoted  by  AcCos{27ifc  +  9i{t)),  and  let  the  VCO  output  be  denoted  by 
Ay  cos(27r/c  -|-  9o(t)  +  f )  =  —Ay  sin(27r/c  -|-  9o(t)).  The  output  of  the  mixer  is  now  given  by 

-AcAy  cos  {271  fc  +  9i{t))  sin  {271  fc  +  9o{t)) 

=  sin  {9i{t)  -  9o{t))  -  sin  (dvr/ct  9i{t)  +  9o{t)) 

The  second  term  on  the  right-hand  side  is  a  passband  signal  at  2/^  which  can  be  hltered  out 
as  before.  The  hrst  term  is  the  desired  driving  term,  and  with  the  change  of  notation,  we  note 
that  the  desired  state,  when  the  driving  term  is  zero,  corresponds  to  9i  =  9o-  The  mixer  based 
realization  of  the  PLL  is  shown  in  Figure  3.32. 

The  instantaneous  frequency  of  the  VCO  is  proportional  to  its  input.  Thus  the  phase  of  the 
VCO  output  —sin{27ifct  -|-  9o{t))  is  given  by 

9o{t)  =  Ky  f  x{T)dT 

Jo 
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ignoring  integration  constants.  Taking  Laplace  transforms,  we  have  0o(s)  =  K^X{s) / s.  The 
reference  frequency  fc  is  chosen  as  the  quiescent  frequency  of  the  VCO,  which  is  the  frequency  it 
would  produce  when  its  input  voltage  is  zero. 


VCO  Output 


'r 

Output  of  XOR  gate 


Figure  3.33:  PLL  realization  using  XOR  gate  as  phase  detector. 


Figure  3.34:  Response  for  the  XOR  phase  detector. 

Mixed  signal  phase  detectors:  Modern  hardware  realizations  of  the  PLL,  particularly  for 
applications  involving  digital  waveforms  (e.g.,  a  clock  signal),  often  realize  the  phase  detector 
using  digital  logic.  The  most  rudimentary  of  these  is  an  exclusive  or  (XOR)  gate,  as  shown  in 
Figure  3.33.  For  the  scenario  depicted  in  the  hgure,  we  see  that  the  average  value  of  the  output 
of  the  XOR  gate  is  linearly  related  to  the  phase  offset  7.  Normalizing  a  period  of  the  square 
wave  to  length  27r,  this  DC  value  V  is  related  to  7  as  shown  in  Figure  3.34(a).  Note  that,  for 
zero  phase  offset,  we  have  V  =  Vhi,  and  that  the  response  is  symmetric  around  7  =  0.  In  order 
to  get  a  linear  phase  detector  response  going  through  the  origin,  we  translate  this  curve  along 
both  axes:  we  dehne  V  =  V  —  {Vlq  +  Vhi)  /2  as  a  centered  response,  and  we  dehne  the  phase 
offset  6^  =  7  —  Thus,  the  lock  condition  {9  =  0)  corresponds  to  the  square  waves  being  90°  out 
of  phase.  This  translation  gives  us  the  phase  response  shown  in  Figure  3.34(b),  which  looks  like 
a  triangular  version  of  the  sinusoidal  response  for  the  mixer-based  phase  detector. 

The  simple  XOR-based  phase  detector  has  the  disadvantage  of  requiring  that  the  waveforms 
have  50%  duty  cycle.  In  practice,  more  sophisticated  phase  detectors,  often  based  on  edge 
detection,  are  used.  These  include  “phase-frequency  detectors”  that  directly  provide  information 
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on  frequency  differences,  which  is  useful  for  rapid  locking.  While  discussion  of  the  many  phase 
detector  variants  employed  in  hardware  design  is  beyond  our  scope,  references  for  further  study 
are  provided  at  the  end  of  this  chapter. 


3.5.1  PLL  Applications 


Before  trying  to  understand  how  a  PLL  works  in  more  detail,  let  us  discuss  how  we  would  use 
it,  assuming  that  it  has  been  properly  designed.  That  is,  suppose  we  can  design  a  system  such 
as  that  depicted  in  Figure  3.32,  such  that  6*o(t)  Oi{t).  What  would  we  do  with  such  a  system? 


Function  of  phase  difference  0,-0 „ 


FM  received 
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Figure  3.35:  The  PLL  is  an  FM  demodulator. 


PLL  as  FM  demodulator:  If  the  PLL  input  is  an  FM  signal,  its  phase  is  given  by 


9i{t)  =  2Tikf  /  m{T)dT 


The  VCO  output  phase  is  given  by 


9o{t)  =Kv  x{T)dT 


If  9o  ~  9i,  then  ^  so  that 

K^x{t)  ~  27rkfm{t) 

That  is,  the  VCO  input  is  approximately  equal  to  a  scaled  version  of  the  message.  Thus,  the 
PLL  is  an  FM  demodulator,  where  the  FM  signal  is  the  input  to  the  PLL,  and  the  demodulator 
output  is  the  VCO  input,  as  shown  in  Figure  3.35. 

PLL  as  frequency  synthesizer:  The  PLL  is  often  used  to  synthesize  the  local  oscillators  used 
in  communication  transmitters  and  receivers.  In  a  typical  scenario,  we  might  have  a  crystal 
oscillator  which  provides  an  accurate  frequency  reference  at  a  relatively  low  frequency,  say  40 
MHz.  We  wish  to  use  this  to  derive  an  accurate  frequency  reference  at  a  higher  frequency,  say  1 
GHz,  which  might  be  the  local  oscillator  used  at  an  IF  or  RF  stage  in  the  transceiver.  We  have 
a  VCO  that  can  produce  frequencies  around  1  GHz  (but  is  not  calibrated  to  produce  the  exact 
value  of  the  desired  frequency),  and  we  wish  to  use  it  to  obtain  a  frequency  /o  that  is  exactly 
K  times  the  crystal  frequency  f crystal-  This  can  be  achieved  by  adding  a  frequency  divider  into 
the  PLL  loop,  as  shown  in  Figure  3.36.  Such  frequency  dividers  can  be  implemented  digitally 
by  appropriately  skipping  pulses.  Many  variants  of  this  basic  concept  are  possible,  such  as  using 
multiple  frequency  dividers,  frequency  multipliers,  or  multiple  interacting  loops. 

All  of  these  applications  rely  on  the  basic  property  that  the  VCO  output  phase  successfully  tracks 
some  reference  phase  using  the  feedback  in  the  loop.  Let  us  now  try  to  get  some  insight  into  how 
this  happens,  and  into  the  impact  of  various  parameters  on  the  PLL’s  performance. 


121 


Frequency 

synthesizer 

output 


Figure  3.36:  Frequency  synthesis  using  a  PLL  by  inserting  a  frequency  divider  into  the  loop. 


3.5.2  Mathematical  Model  for  the  PLL 


Loop  gain 
and  filter 


VCO  functionality 
(normalized) 

Figure  3.37:  Nonlinear  model  for  mixer-based  PLL. 


The  mixer-based  PLL  in  Figure  3.32  can  be  modeled  as  shown  in  Figure  3.37,  where  6i(t)  is 
the  input  phase,  and  9o(t)  is  the  output  phase.  It  is  also  useful  to  dehne  the  corresponding 
instantaneous  frequencies  (or  rather,  frequency  deviations  from  the  VCO  quiescent  frequency 

/c): 

r 

^  2ti  dt  '  ’  2ti  dt 

The  phase  and  frequency  errors  are  dehned  as 

9e{t)  =  9i{t)  -  9o{t)  ,  fe{t)  =  fi{t)  -  fo{t) 

In  deriving  this  model,  we  can  ignore  the  passband  term  at  2/c,  which  will  get  rejected  by  the 
integration  operation  due  to  the  VCO,  as  well  as  by  the  loop  hlter  (if  a  nontrivial  lowpass  loop 
hlter  is  employed).  From  Figure  3.32,  the  sine  of  the  phase  difference  is  amplihed  by  ^AcA^  due 
to  the  amplitudes  of  the  PLL  input  and  VCO  output.  This  is  passed  through  the  loop  hlter, 
which  has  transfer  function  G{s),  and  then  through  the  VCO,  which  has  a  transfer  function  Ky/s. 
The  loop  gain  K  shown  in  Figure  3.37  is  set  to  be  the  product  K  =  ^AyAyKy  (in  addition,  the 
loop  gain  also  includes  additional  amplihcation  or  attenuation  in  the  loop  that  is  not  accounted 
for  in  the  transfer  function  G(s)). 

The  model  in  Figure  3.37  is  difficult  to  analyze  because  of  the  sin(-)  nonlinearity  after  the  phase 
difference  operation.  One  way  to  avoid  this  difficulty  is  to  linearize  the  model  by  simply  dropping 
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Figure  3.38:  Linearized  PLL  model. 


the  nonlinearity.  The  motivation  is  that,  when  the  input  and  output  phases  are  close,  as  is  the 
case  when  the  PLL  is  in  tracking  mode,  then 

sin(6>i  -  Oo)  ^Oi-Oo 

Applying  this  approximation,  we  obtain  the  linearized  model  of  Figure  3.38.  Note  that,  for  the 
XOR-based  response  shown  in  Figure  3.34(b),  the  response  is  exactly  linear  for  |0|  <  |. 


3.5.3  PLL  Analysis 


Under  the  linearized  model,  the  PLL  becomes  an  LTI  system  whose  analysis  is  conveniently 
performed  using  the  Laplace  transform.  From  Figure  3.38,  we  see  that 

(0.(s)-0o(s))A^G(s)/s  =  0o(s) 


from  which  we  infer  the  input-output  relationship 


_  0o(g)  _  KG{s) 
0i(s)  s  -\-  KG[s) 


It  is  also  useful  to  express  the  phase  error  6e  in  terms  of  the  input  6i,  as  follows: 


He{s) 


Qejs) 

Q^is) 


Qijs)  -  Qois) 

e^is) 


s 

s  +  KG{s) 


(3.32) 


(3.33) 


For  this  LTI  model,  the  same  transfer  functions  also  govern  the  relationships  between  the  input 
and  output  instantaneous  frequencies:  since  Fi(s)  =  ^0j(s)  and  Fo{s)  =  ^0o(s),  we  obtain 
Fo{s) / Fi{s)  =  0o(s)/0j(s).  Thus,  we  have 


Fojs)  KGjs) 

Fi{s)  ^  ^  s  +  KG{s) 

Fi{s)-Fo{s)  ^  /  X  g 

F,{s)  ^  s  +  KG{s) 


(3.34) 

(3.35) 
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First  order  PLL:  When  we  have  a  trivial  loop  hlter,  G{s)  =  1,  we  obtain  the  hrst  order  response 


H{s) 


K 

s  +  K' 


Heis) 


s 

s  +  K 


which  is  a  stable  response  for  loop  gain  K  >  0,  with  a  single  pole  at  s  =  —K.  It  is  interesting  to 
see  what  happens  when  the  input  phase  is  a  step  function,  9i(t)  =  A9I[o^ao)(t),  or  0j(s)  =  A9/s. 
We  obtain 


0o(s)  =  H{s)Qi{s) 


KA9 
s{s  +  K) 


A9  _  A9 
s  s  +  K 


Taking  the  inverse  Laplace  transform,  we  obtain 


9,{t)  =  A9{l-e-^^)I[o,oo){t) 


so  that  9o(t)  A9  as  t  ^  oo.  Thus,  the  hrst  order  PLL  can  track  a  sudden  change  in  phase, 
with  the  output  phase  converging  to  the  input  phase  exponentially  fast.  The  residual  phase  error 
is  zero.  Note  that  we  could  also  have  inferred  this  quickly  from  the  hnal  value  theorem,  without 
taking  the  inverse  Laplace  transform: 


lim  9e{t)  =  lims0e(s)  =  lim  siLe(s)0i(s)  (3.36) 

t—^OO  5^0 


Specializing  to  the  setting  of  interest,  we  obtain 


lim  9Jt)  =  lims — - — =  0 

t^oo  s^O  S  +  K  s 


We  now  examine  the  response  of  the  hrst  order  PLL  to  a  frequency  step  A/,  so  that  the  instanta¬ 
neous  input  frequency  is  fi{t)  =  A/J[o,oo)(^)-  The  corresponding  Laplace  transform  is  Fi{s)  = 

The  input  phase  is  the  integral  of  the  instantaneous  frequency: 


9i{t)  =  271  f  /i(r)dr 
Jo 

The  Laplace  transform  of  the  input  phase  is  therefore  given  by 

0,(5)  =  27iF{s)/s  = 

Given  that  the  input-output  relationships  are  identical  for  frequency  and  phase,  we  can  reuse 
the  computations  we  did  for  the  phase  step  input,  replacing  phase  by  frequency,  to  conclude  that 
foit)  =  A/(l  —  e“^^)J[o,oo)(^)  A/  as  t  — )■  cxo,  so  that  the  steady-state  frequency  error  is  zero. 
The  corresponding  output  phase  trajectory  is  left  as  an  exercise,  but  we  can  use  the  hnal  value 
theorem  to  compute  the  limiting  value  of  the  phase  error: 


lim  9e{t) 

t^OO 


lim  s 

s^O 


S 

s  +  K 


2vrA/ 


27rA/ 

K 


Thus,  the  hrst  order  PLL  can  adapt  its  frequency  to  track  a  step  frequency  change,  but  there  is 
a  nonzero  steady-state  phase  error.  This  can  be  hxed  by  increasing  the  order  of  the  PLL,  as  we 
now  show  below. 

Second  order  PLL:  We  now  introduce  a  loop  hlter  which  feeds  back  both  the  phase  error  and 
the  integral  of  the  phase  error  to  the  VCO  input  (in  control  theory  terminology,  we  are  using 
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’’proportional  plus  integral”  feedback).  That  is,  G(s)  =  1  +  a/s,  where  a  >  0.  This  yields  the 
second  order  response 

H(s)  = 

s  +  KG{s)  s^  +  Ks  +  Ka 


s  +  KG{s)  s^  +  Ks  +  Ka 

The  poles  of  the  response  are  at  s  =  It  is  easy  to  check  that  the  response  is  stable 

(i.e.,  the  poles  are  in  the  left  half  plane)  for  K  >  0.  The  poles  are  conjugate  symmetric  with  an 
imaginary  component  if  K"^  —  AKa  <  0,  or  it'  <  4a,  otherwise  they  are  both  real-valued.  Note 
that  the  phase  error  due  to  a  step  frequency  response  does  go  to  zero.  This  is  easily  seen  by 
invoking  the  hnal  value  theorem  (3.36): 


lim  9e(t) 

>00 


lims— 


27rA/ 

-|-  Ks  +  Ka 


0 


Thus,  the  second  order  PLL  has  zero  steady  state  frequency  and  phase  errors  when  responding 
to  a  constant  frequency  offset. 

We  have  seen  now  that  the  hrst  order  PLL  can  handle  step  phase  changes,  and  the  second  order 
PLL  can  handle  step  frequency  changes,  while  driving  the  steady-state  phase  error  to  zero.  This 
pattern  continues  as  we  keep  increasing  the  order  of  the  PLL:  for  example,  a  third  order  PLL 
can  handle  a  linear  frequency  ramp,  which  corresponds  to  0i(s)  being  proportional  to  1/s^. 

Linearized  analysis  provides  quick  insight  into  the  complexity  of  the  phase/frequency  variations 
that  the  PLL  can  track,  as  a  function  of  the  choice  of  loop  filter  and  loop  gain.  We  now  take 
another  look  at  the  first  order  PLL,  accounting  for  the  sin(-)  nonlinearity  in  Figure  3.37,  in 
order  to  provide  a  glimpse  of  the  approach  used  for  handling  the  nonlinear  differential  equations 
involved,  and  to  compare  the  results  with  the  linearized  analysis. 

Nonlinear  model  for  the  first  order  PLL:  Let  us  try  to  express  the  phase  error  6^  in  terms  of 
the  input  phase  for  a  hrst  order  PLL,  with  G{s)  =  1.  The  model  of  Figure  3.37  can  be  expressed 
in  the  time  domain  as: 

K sin(6>e(r))dr  =  6o{t)  =  6i{t)  -  6e{t) 

Differentiating  with  respect  to  t,  we  obtain 


K  sin  6e 


dOi  dOe 

dt  dt 


(3.37) 


(Both  Of.  and  9i  are  functions  of  t,  but  we  suppress  the  dependence  for  notational  simplicity.) 
Let  us  now  specialize  to  the  specihc  example  of  a  step  frequency  input,  for  which 


dt 


27rA/ 


Plugging  into  (3.37)  and  rearranging,  we  get 


— ^  =  271 A  f  —  K  sin  9^ 
dt 


(3.38) 


We  cannot  solve  the  nonlinear  differential  equation  (3.38)  for  9^  analytically,  but  we  can  get 
useful  insight  by  a  “phase  plane  plot”  of  ^  against  9e,  as  shown  in  Figure  3.39.  Since  sinde  <  1, 
we  have  ^  >  27rA/  —  K,  so  that,  if  A/  >  then  ^  >  0  for  all  t.  Thus,  for  large  enough 
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Figure  3.39:  Phase  plane  plot  for  first  order  PLL. 


frequency  offset,  the  loop  never  locks.  On  the  other  hand,  if  A/  <  then  the  loop  does  lock. 
In  this  case,  starting  from  an  initial  error,  say  0^(0),  the  phase  error  follows  the  trajectory  to  the 
right  (if  the  derivative  is  positive)  or  left  (if  the  derivative  is  negative)  until  it  hits  a  point  at 
which  ^  =  0.  From  (3.38),  this  happens  when 


sin  6>e  =  ^  (3.39) 

Due  to  the  periodicity  of  the  sine  function,  if  6*  is  a  solution  to  the  preceding  equation,  so  is 
6  +  271.  Thus,  if  the  equation  has  a  solution,  there  must  be  at  least  one  solution  in  the  basic 
interval  [— vr,  tt].  Moreover,  since  sin6'  =  sin(7r  —  6^),  if  6^  is  a  solution,  so  is  vr  —  6,  so  that  there 
are  actually  two  solutions  in  [— 7r,7r].  Let  us  denote  by  6e{0)  =  sin“^  solution  that 

lies  in  the  interval  [— 7r/2, 7r/2].  This  forms  a  stable  equilibrium:  from  (3.38),  we  see  that  the 
derivative  is  negative  for  phase  error  slightly  above  0^(0),  and  is  positive  as  the  phase  error 
slightly  below  9e{0),  so  that  the  phase  error  is  driven  back  to  9e{0)  in  each  case.  Using  exactly 
the  same  argument,  we  see  that  the  points  9e{0)  +  2n7r  are  also  stable  equilibria,  where  n  takes 
integer  values.  However,  another  solution  to  (3.39)  is  9e{l)  =  tt  —  6^(0),  and  translations  of  it 
by  .  It  is  easy  to  see  that  this  is  an  unstable  equilibrium:  when  there  is  a  slight  perturbation, 
the  sign  of  the  derivative  is  such  that  it  drives  the  phase  error  away  from  9^(1).  In  general, 
9e{l)  +  2n7r  are  unstable  equlibria,  where  n  takes  integer  values.  Thus,  if  the  frequency  offset  is 
within  the  “pull-in  range”  ^  of  the  first  order  PLL,  then  the  steady  state  phase  offset  (modulo 
27r)  is  6*e(0)  =  sin“^  ?  which,  for  small  values  of  is  approximately  equal  to  the  value 

predicted  by  the  linearized  analysis. 

Linear  versus  nonlinear  model:  Roughly  speaking,  the  nonlinear  model  (which  we  simply 
simulate  when  phase-plane  plots  get  too  complicated)  tells  us  when  the  PLL  locks,  while  the 
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linearized  analysis  provides  accurate  estimates  when  the  PLL  does  lock.  The  linearized  model 
also  tells  us  something  about  scenarios  when  the  PLL  does  not  lock:  when  the  phase  error  blows 
up  for  the  linearized  model,  it  indicates  that  the  PLL  will  perform  poorly.  This  is  because  the 
linearized  model  holds  under  the  assumption  that  the  phase  error  is  small;  if  the  phase  error 
under  this  optimistic  assumption  turns  out  not  to  be  small,  then  our  initial  assumption  must 
have  been  wrong,  and  the  phase  error  must  be  large. 


Figure  3.40:  PLL  for  Example  3.5.1. 


The  following  worked  problem  illustrates  application  of  linearized  PLL  analysis. 


Example  3.5.1  Consider  the  PLL  shown  in  Figure  3.40,  assumed  to  be  locked  at  time  zero. 

(a)  Suppose  that  the  input  phase  jumps  by  e  =  2.72  radians  at  time  zero  (set  the  phase  just 
before  the  jump  to  zero,  without  loss  of  generality).  How  long  does  it  take  for  the  difference 
between  the  PLL  input  phase  and  VCO  output  phase  to  shrink  to  1  radian?  (Make  sure  you 
specify  the  unit  of  time  that  you  use.) 

(b)  Find  the  limiting  value  of  the  phase  error  (in  radians)  if  the  frequency  jumps  by  1  KHz  just 
after  time  zero. 

Solution:  Let  9e(t)  =  9i(t)  —  9o(t)  denote  the  phase  error.  In  the  s  domain,  it  is  related  to  the 
input  phase  as  follows: 

0,(S)  -  -0e(s)  =  0e(s) 

s 

so  that 

0e(s)  _  S 
0i(s)  ~  s  +  K 

(a)  For  a  phase  jump  of  e  radians  at  time  zero,  we  have  0i(s)  =  which  yields 


0e(s) 


e 

s  +  K 


Going  to  the  time  domain,  we  have 


9^{t)  =  ee-^*  = 


so  that  9f.{t)  =  1  for  1  —  Kt  =  0,  or  f  =  ^  =  |  milliseconds. 

(b)  For  a  frequency  jump  of  A/,  the  Laplace  transform  of  the  input  phase  is  given  by 


so  that  the  phase  error  is  given  by 


0^(5)  = 


2vrA/ 


0e(s) 


0*(s) 


s 

S  +  K 


27rA/ 
s{s  +  K) 
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Using  the  final  value  theorem,  we  have 


lim  6e{t)  =  hms0e(s) 

t^OO  S— )-0 


27rA/ 

K 


For  A/  =  1  KHz  and  K  =  5  KHz/radian,  this  yields  a  phase  error  of  27r/5  radians,  or  72°. 


3.6  Some  Analog  Communication  Systems 

Some  of  the  analog  communication  systems  that  we  encounter  (or  at  least,  used  to  encounter) 
in  our  daily  lives  include  broadcast  radio  and  television.  We  have  already  discussed  AM  radio  in 
the  context  of  the  superhet  receiver.  We  now  briefly  discuss  FM  radio  and  television.  Our  goal 
is  to  highlight  design  concepts,  and  the  role  played  in  these  systems  by  the  various  modulation 
formats  we  have  studied,  rather  than  to  provide  a  detailed  technical  description.  Other  commonly 
encountered  examples  of  analog  communication  that  we  do  not  discuss  include  analog  storage 
media  (audiotapes  and  videotapes),  analog  wireline  telephony,  analog  cellular  telephony,  amateur 
ham  radio,  and  wireless  microphones. 


3.6.1  FM  radio 


Pilot 
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Figure  3.41:  Spectrum  of  baseband  input  to  FM  modulator  for  FM  stereo  broadcast. 


FM  mono  radio  employs  a  peak  frequency  deviation  of  75  KHz,  with  the  baseband  audio  message 
signal  bandlimited  to  15  KHz;  this  corresponds  to  a  modulation  index  of  5.  Using  Carson’s 
formula,  the  bandwidth  of  the  FM  radio  signal  can  be  estimated  as  180  KHz.  The  separation 
between  adjacent  radio  stations  is  200  KHz.  FM  stereo  broadcast  transmits  two  audio  channels, 
“left”  and  “right,”  in  a  manner  that  is  backwards  compatible  with  mono  broadcast,  in  that 
a  standard  mono  receiver  can  extract  the  sum  of  the  left  and  right  channels,  while  remaining 
oblivious  to  whether  the  broadcast  signal  is  mono  or  stereo.  The  structure  of  the  baseband  signal 
into  the  FM  modulator  is  shown  in  Figure  3.41.  The  sum  of  the  left  and  right  channels,  or  the 
L  +  R  signal,  occupies  a  band  from  30  Hz  to  15  KHz.  The  difference,  or  the  L  —  R  signal  (which 
also  has  a  bandwidth  of  15  KHz),  is  modulated  using  DSB-SC,  using  a  carrier  frequency  of  38 
KHz,  and  hence  occupies  a  band  from  23  KHz  to  53  KHz.  A  pilot  tone  at  19  KHz,  at  half  the 
carrier  frequency  for  the  DSB  signal,  is  provided  to  enable  coherent  demodulation  of  the  DSB-SC 
signal.  The  spacing  between  adjacent  FM  stereo  broadcast  stations  is  still  200  KHz,  which  makes 
it  a  somewhat  tight  fit  (if  we  apply  Carson’s  formula  with  a  maximum  frequency  deviation  of  75 
KHz,  we  obtain  an  RF  bandwidth  of  256  KHz). 

The  format  of  the  baseband  signal  in  Figure  3.41  (in  particular,  the  DSB-SC  modulation  of  the 
difference  signal)  seems  rather  contrived,  but  the  corresponding  modulator  can  be  implemented 
quite  simply,  as  sketched  in  Figure  3.42:  we  simply  switch  between  the  L  and  R  channel  audio 
signals  using  a  38  KHz  clock.  As  we  shown  in  one  of  the  problems,  this  directly  yields  the  L  +  R 
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signal 


Figure  3.42:  Block  diagram  of  a  simple  FM  stereo  transmitter. 


signal,  plus  the  DSB-SC  modulated  L  —  R  signal.  It  remains  to  add  in  the  19  KHz  pilot  before 
feeding  the  composite  baseband  signal  to  the  FM  modulator. 

The  receiver  employs  an  FM  demodulator  to  obtain  an  estimate  of  the  baseband  transmitted 
signal.  The  L  +  R  signal  is  obtained  by  bandlimiting  the  output  of  the  FM  demodulator  to  15 
KHz  using  a  lowpass  filter;  this  is  what  an  oblivious  mono  receiver  would  do.  A  stereo  receiver, 
in  addition,  processes  the  output  of  the  FM  demodulator  in  the  band  from  15  KHz  to  53  KHz. 
It  extracts  the  19  KHz  pilot  tone,  doubles  its  frequency  to  obtain  a  coherent  carrier  reference, 
and  uses  that  to  demodulate  the  L  —  R  signal  sent  using  DSB-SC.  It  then  obtains  the  L  and  R 
channels  by  adding  and  subtracting  the  L  +  R  and  L  —  R  signals  from  each  other,  respectively. 


3.6.2  Analog  broadcast  TV 

While  analog  broadcast  TV  is  obsolete,  and  is  being  replaced  by  digital  TV  as  we  speak,  we 
discuss  it  briefly  here  to  highlight  a  few  features.  First,  it  illustrates  an  application  of  several 
modulation  schemes:  VSB  (for  intensity  information),  qnadrature  modulation  (for  the  color  in¬ 
formation),  and  FM  (for  audio  information).  Second,  it  is  an  interesting  example  of  how  the 
embedding  of  different  kinds  of  information  in  analog  form  must  account  for  the  characteris¬ 
tics  of  the  information  source  (video)  and  destination  (a  cathode  ray  tube  TV  monitor).  This 
customized,  and  rather  painful,  design  process  is  in  contrast  to  the  generality  and  conceptual 
simplification  provided  by  the  sonrce-channel  separation  principle  in  digital  communication  (men¬ 
tioned  in  Chapter  1).  Indeed,  from  Chapter  4  onwards,  where  we  restrict  attention  to  digital 
communication,  we  do  not  need  to  discuss  source  characteristics. 

We  hrst  need  a  quick  discussion  of  CRT  TV  monitors.  An  electron  beam  impinging  on  a  fluores¬ 
cent  screen  is  used  to  emit  the  light  that  we  perceive  as  the  image  on  the  TV.  The  electron  beam 
is  “raster  scanned”  in  horizontal  lines  moving  down  the  screen,  with  its  horizontal  and  vertical 
location  controlled  by  two  magnetic  helds  created  by  voltages,  as  shown  in  Figure  3.43.  We  rely 
on  the  persistence  of  human  vision  to  piece  together  these  discrete  scans  into  a  continuous  image 
in  space  and  time.  Black  and  white  TV  monitors  use  a  phosphor  (or  fluorescent  material)  that 
emits  white  light  when  struck  by  electrons.  Color  TV  monitors  use  three  kinds  of  phosphors, 
typically  arranged  as  dots  on  the  screen,  which  emit  red,  green  and  blue  light,  respectively,  when 
struck  by  electrons.  Three  electron  beams  are  used,  one  for  each  color.  The  intensity  of  the 
emitted  light  is  controlled  by  the  intensity  of  the  electron  beam.  For  historical  reasons,  the  scan 
rate  is  chosen  to  be  equal  to  the  frequency  of  the  AC  power  (otherwise,  for  the  power  snpplies 
used  at  the  time,  rolling  bars  would  appear  on  the  TV  screen).  In  the  United  States,  this  means 
that  the  scan  rate  is  set  at  60  Hz  (the  frequency  of  the  AC  mains). 

In  order  to  enable  the  TV  receiver  to  control  the  operation  of  the  CRT  monitor,  the  received  signal 
must  contain  not  only  intensity  and  color  information,  but  also  the  timing  information  required 
to  correctly  implement  the  raster  scan.  Figure  3.44  shows  the  format  of  the  composite  video  signal 
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trajectory 

CRT  Schematic 


Raster  scan  pattern 


Horizontal  position  control 


Vertical  position  control 

Controls  needed  for  raster  scan 

Figure  3.43:  Implementing  raster  scan  in  a  CRT  monitor  requires  magnetic  fields  controlled  by 
sawtooth  waveforms. 


Video  information 
(odd  field) 


Video  information 
(even  field) 


Line  1  Line  3  Line  479  Line  2  Line  4  Line  480 


(not  shown) 


Figure  3.44:  The  structure  of  a  black  and  white  composite  video  signal  (numbers  apply  to  the 
NTSC  standard). 
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containing  this  information.  In  order  to  reduce  flicker  (again  a  historical  legacy,  since  older  CRT 
monitors  could  not  maintain  intensities  long  enough  if  the  time  between  refreshes  is  too  long),  the 
CRT  screen  is  painted  in  two  rounds  for  each  image  (or  frame):  first  the  odd  lines  (comprising  the 
odd  field)  are  scanned,  then  the  even  lines  (comprising  the  even  field)  are  scanned.  For  the  NTSC 
standard,  this  is  done  at  a  rate  of  60  fields  per  second,  or  30  frames  per  second.  A  horizontal  sync 
pulse  is  inserted  between  each  line.  A  more  complex  vertical  synchronization  waveform  is  inserted 
between  each  field;  this  enables  vertical  synchronization  (as  well  as  other  functionaliities  that 
we  do  not  discuss  here).  The  receiver  can  extract  the  horizontal  and  vertical  timing  information 
from  the  composite  video  signal,  and  generate  the  sawtooth  waveforms  required  for  controlling 
the  electron  beam  (one  of  the  first  widespread  commercial  applications  of  the  PLL  was  for  this 
purpose).  For  the  NTSC  standard,  the  composite  video  signal  spans  525  lines,  about  486  of 
which  are  actually  painted  (counting  both  the  even  and  odd  fields).  The  remaining  39  lines 
accommodate  the  vertical  synchronization  waveforms. 

The  bandwidth  of  the  baseband  video  signal  can  be  roughly  estimated  as  follows.  Assuming 
about  480  lines,  with  about  640  pixels  per  line  (for  an  aspect  ratio  of  4:3),  we  have  about  300,000 
pixels,  refreshed  at  the  rate  of  30  times  per  second.  Thus,  our  overall  sampling  rate  is  about 
9  Msamples/second.  This  can  accurately  represent  a  signal  of  bandwidth  4.5  MHz.  For  a  6 
MHz  TV  channel  bandwidth,  DSB  and  wideband  FM  are  therefore  out  of  the  question,  and 
VSB  was  chosen  to  modulate  the  composite  video  signal.  However,  the  careful  shaping  of  the 
spectrum  required  for  VSB  is  not  carried  out  at  the  transmitter,  because  this  would  require 
the  design  of  high-power  electronics  with  tight  specifications.  Instead,  the  transmitter  uses  a 
simple  filter,  while  the  receiver,  which  deals  with  a  low-power  signal,  accomplishes  the  VSB 
shaping  requirement  in  (3.21).  Audio  modulation  is  done  using  FM  in  a  band  adjacent  to  the 
one  carrying  the  video  signal. 

While  the  signaling  for  black  and  white  TV  is  essentially  the  same  for  all  existing  analog  TV 
standards,  the  insertion  of  color  differs  among  standards  such  as  NTSC,  PAL  and  SECAM.  We 
do  not  go  into  details  here,  but,  taking  NTSC  as  an  example,  we  note  that  the  frequency  domain 
characteristics  of  the  black  and  white  composite  video  signal  is  exploited  in  rather  a  clever  way 
to  insert  color  information.  The  black  and  white  signal  exhibits  a  clustering  of  power  around  the 
Fourier  series  components  corresponding  to  the  horizontal  scan  rate,  with  the  power  decaying 
around  the  higher  order  harmonics.  The  color  modulated  signal  uses  the  same  band  as  the  black 
and  white  signal,  but  is  inserted  between  two  such  harmonics,  so  as  to  minimize  the  mutual 
interference  between  the  intensity  information  and  the  color  information.  The  color  information 
is  encoded  in  two  baseband  signals,  which  are  modulated  on  to  the  I  and  Q  components  using 
QAM.  Synchronization  information  that  permits  coherent  recovery  of  the  color  subcarrier  for 
quadrature  demodulation  is  embedded  in  the  vertical  synchronization  waveform. 


3.7  Problems 


Amplitude  modulation 

Problem  3.1  Figure  3.45  shows  a  signal  obtained  after  amplitude  modulation  by  a  sinusoidal 
message.  The  carrier  frequency  is  difficult  to  determine  from  the  figure,  and  is  not  needed  for 
answering  the  questions  below. 

(a)  Find  the  modulation  index. 

(b)  Find  the  signal  power. 

(c)  Find  the  bandwidth  of  the  AM  signal. 


Problem  3.2  Consider  a  message  signal  m{t)  =  2  cos  (2Trt  -|- 

(a)  Sketch  the  spectrum  U{f)  of  the  DSB-SC  signal  Up{t)  =  8m(t)  cosdOOvrf.  What  is  the  power 
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Figure  3.45:  Amplitude  modulated  signal  for  Problem  3.1. 


of  u7 

(b)  Carefully  sketch  the  output  of  an  ideal  envelope  detector  with  input  Up.  On  the  same  plot, 
sketch  the  message  signal  m(t). 

(c)  Let  Vp(t)  denote  the  waveform  obtained  by  high-pass  hltering  the  signal  u(t)  so  as  to  let 
through  only  frequencies  above  200  Hz.  Find  Vc{t)  and  Vs{t)  such  that  we  can  write 

Vp{t)  =  Vc{t)  cosdOOvrt  —  Vs{t)  sindOOvrt 

and  sketch  the  envelope  of  v. 

Problem  3.3  A  message  to  be  transmitted  using  AM  is  given  by 

m{t)  =  3  cos  27rt  +  4  sin  Ovrt 

where  the  unit  of  time  is  milliseconds.  It  is  to  be  sent  using  a  carrier  frequency  of  600  KHz. 

(a)  What  is  the  message  bandwidth?  Sketch  its  magnitude  spectrum,  clearly  specifying  the  units 
used  on  the  frequency  axis. 

(b)  Find  an  expression  for  the  normalized  message  mn{t). 

(c)  For  a  modulation  index  of  50%,  write  an  explicit  time  domain  expression  for  the  AM  signal. 

(d)  What  is  the  power  efficiency  of  the  AM  signal? 

(e)  Sketch  the  magnitude  spectrum  for  the  AM  signal,  again  clearly  specifying  the  units  used  on 
the  frequency  axis. 

(f)  The  AM  signal  is  to  be  detected  using  an  envelope  detector  (as  shown  in  Figure  3.8),  with 
i?  =  50  ohms.  What  is  a  good  range  of  choices  for  the  capacitance  C? 

Problem  3.4  Consider  a  message  signal  m{t)  =  cos(27r/mt  +  </>),  and  a  corresponding  DSB-SC 
signal  Up{t)  =  Am(t)  cos27r/ct,  where  fc  >  fm- 

(a)  Sketch  the  spectra  of  the  corresponding  LSB  and  USB  signals  (if  the  spectrum  is  complex¬ 
valued,  sketch  the  real  and  imaginary  parts  separately). 

(b)  Find  explicit  time  domain  expressions  for  the  LSB  and  USB  signals. 

Problem  3.5  One  way  of  avoiding  the  use  of  a  mixer  in  generating  AM  is  to  pass  x{t)  = 
m{t)  +  a  cos27r fct  through  a  memoryless  nonlinearity  and  then  a  bandpass  hlter. 

(a)  Suppose  that  M{f)  =  (1  —  |/|/10)/[_io,io]  (the  unit  of  frequency  is  in  KHz)  and  fc  is  900 
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KHz.  For  a  nonlinearity  /(x)  =  (3x^  +  x,  sketch  the  magnitude  spectrum  at  the  output  of  the 
nonlinearity  when  the  input  is  x(t),  carefully  labeling  the  frequency  axis. 

(b)  For  the  specific  settings  in  (a),  characterize  the  bandpass  filter  that  you  should  use  at  the 
output  of  the  nonlinearity  so  as  to  generate  an  AM  signal  carrying  the  message  m{t)l  That  is, 
describe  the  set  of  the  frequencies  that  the  BPF  must  reject,  and  those  that  it  must  pass. 


Problem  3.6  Consider  a  DSB  signal  corresponding  to  the  message  m{t)  =  sinc(2t)  and  a  carrier 
frequency  fc  which  is  100  times  larger  than  the  message  bandwidth,  where  the  unit  of  time  is 
milliseconds. 

(a)  Sketch  the  magnitude  spectrum  of  the  DSB  signal  lOm(t)  cos27r/ct,  specifying  the  units  on 
the  frequency  axis. 

(b)  Specify  a  time  domain  expression  for  the  corresponding  LSB  signal. 

(c)  Now,  suppose  that  the  DSB  signal  is  passed  through  a  bandpass  filter  whose  transfer  function 
is  given  by 


Hpif)  -  if  -  fc  +  /,+  i]  +  ^1 


[/c 


A>/c+|] 


/>0 


Sketch  the  magnitude  spectrum  of  the  corresponding  VSB  signal, 

(d)  Find  a  time  domain  expression  for  the  VSB  signal  of  the  form 


Uc(t)  COS  2% fct  —  Us(t)  sin  271  fct 


carefully  specifying  Uc  and  Ug-  the  I  and  Q  components. 


Message 
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Figure  3.46:  Block  diagram  of  Weaver’s  SSB  modulator  for  Problem  3.7. 


Problem  3.7  Figure  3.46  shows  a  block  diagram  of  Weaver’s  SSB  modulator,  which  works  if  we 
choose  /i,  /2  and  the  bandwidth  of  the  lowpass  hlter  appropriately.  Let  us  work  through  these 
choices  for  a  waveform  of  the  form  m(t)  =  Alcos{27i  f^t  +  (pi)  +  Ah  cos{277  fut  +  where 
Jh  >  Jl  (the  design  choices  we  obtain  will  work  for  any  message  whose  spectrum  lies  in  the  band 
[/l,  fn]- 

(a)  For  /i  =  {fi  +  fH)/2  (i.e.,  choosing  the  first  LO  frequency  to  be  in  the  middle  of  the  message 
band),  find  the  time  domain  waveforms  at  the  outputs  of  the  upper  and  lower  branches  after  the 
first  mixer. 

(b)  Choose  the  bandwidth  of  the  lowpass  filter  to  be  hF  =  (^assume  the  lowpass  hlter  is 

ideal).  Find  the  time  domain  waveforms  at  the  outputs  of  the  upper  and  lower  branches  after 
the  LPF. 

(c)  Now,  assuming  that  /2  3>  /jr,  hnd  a  time  domain  expression  for  the  output  waveform,  as¬ 
suming  that  the  upper  and  lower  branches  are  added  together.  Is  this  an  LSB  or  USB  waveform? 
What  is  the  carrier  frequency? 
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(d)  Repeat  (c)  when  the  lower  branch  is  subtracted  from  the  upper  branch. 

Remark:  Weaver’s  modulator  does  not  require  bandpass  filters  with  sharp  cutoffs,  unlike  the 
direct  approach  to  generating  SSB  waveforms  by  filtering  DSB-SC  waveforms.  It  is  also  simpler 
than  the  Hilbert  transform  method  (the  latter  requires  implementation  of  a  7r/2  phase  shift  over 
the  entire  message  band). 


Figure  3.47:  Bandpass  hlter  for  Problem  3.8. 


Problem  3.8  Consider  the  AM  signal  Up{t)  =  2(10  +  cos  27r/mt)  cos  27r/ct,  where  the  message 
frequency  fm  is  1  MHz  and  the  carrier  frequency  fc  is  885  MHz. 

(a)  Suppose  that  we  use  superheterodyne  reception  with  an  IF  of  10.7  MHz,  and  envelope  detec¬ 
tion  after  the  IF  filter.  Envelope  detection  is  accomplished  as  in  Figure  3.8,  using  a  diode  and 
an  RC  circuit.  What  would  be  a  good  choice  of  C*  if  i?  =  100  ohms? 

(b)  The  AM  signal  Up{t)  is  passed  through  the  bandpass  filter  with  transfer  function  Hp{f)  de¬ 
picted  (for  positive  frequencies)  in  Figure  3.47.  Find  the  I  and  Q  components  of  the  filter  output 
with  respect  to  reference  frequency  fc  of  885  MHz.  Does  the  hlter  output  represent  a  form  of 
modulation  you  are  familiar  with? 


Problem  3.9  Consider  a  message  signal  m{t)  with  spectrum  M{f)  =  /[_2,2](/)- 

(a)  Sketch  the  spectrum  of  the  DSB-SC  signal  udsb-sc  =  lOm(t)  cos3007rt.  What  is  the  power 
and  bandwidth  of  u7 

(b)  The  signal  in  (a)  is  passed  through  an  envelope  detector.  Sketch  the  output,  and  comment 
on  how  it  is  related  to  the  message. 

(c)  What  is  the  smallest  value  of  A  such  that  the  message  can  be  recovered  without  distortion 
from  the  AM  signal  uam  =  (A  -|-  m{t))  cos3007rt  by  envelope  detection? 

(d)  Give  a  time-domain  expression  of  the  form 

Up(t)  =  Uc(t)  COS  dOOvrf  —  Us(t)  sin  dOOvrf 

obtained  by  high-pass  hltering  the  DSB  signal  in  (a)  so  as  to  let  through  only  frequencies  above 
150  Hz. 

(e)  Consider  a  VSB  signal  constructed  by  passing  the  signal  in  (a)  through  a  passband  hlter  with 
transfer  function  for  positive  frequencies  specihed  by: 


HpU) 


f  -  149  149  </  <  151 
2  /  >  151 


(you  should  be  able  to  sketch  Hp{f)  for  both  positive  and  negative  frequencies.)  Find  a  time 
domain  expression  for  the  VSB  signal  of  the  form 

Up{t)  =  Uc{t)  COS  dOOvrf  —  Us{t)  sin  dOOvri 
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Problem  3.10  Consider  Figure  3.17  depicting  VSB  spectra.  Suppose  that  the  passband  VSB 
hlter  Hp{f)  is  specihed  (for  positive  frequencies)  as  follows: 

r  1,  101  </  <  102 

Hpif)  =  I  i  (/  -  99) ,  99  <  /  <  101 
0,  else 

(a)  Sketch  the  passband  transfer  function  Hp{f)  for  both  positive  and  negative  frequencies. 

(b)  Sketch  the  spectrum  of  the  complex  envelope  H{f),  taking  fc  =  100  as  a  reference. 

(c)  Sketch  the  spectra  (show  the  real  and  imaginary  parts  separately)  of  the  I  and  Q  components 
of  the  impulse  response  of  the  passband  hlter. 

(d)  Consider  a  message  signal  of  the  form  m{t)  =  4sinc4t  —  2  cos  27rt.  Sketch  the  spectrum  of  the 
DSB  signal  that  results  when  the  message  is  modulated  by  a  carrier  at  fc  =  100. 

(e)  Now,  suppose  that  the  DSB  signal  in  (d)  is  passed  through  the  VSB  hlter  in  (a)-(c).  Sketch  the 
spectra  of  the  I  and  Q  components  of  the  resulting  VSB  signal,  showing  the  real  and  imaginary 
parts  separately. 

(f)  Find  a  time  domain  expression  for  the  Q  component. 


Problem  3.11  Consider  the  periodic  signal  m{t)  =  ~  2’^)’  where  p{t)  =  t/[_i  i](t). 

(a)  Sketch  the  AM  signal  x{t)  =  (4  +  m(t))  cos  lOOvri. 

(b)  What  is  the  power  efficiency? 


Superheterodyne  reception 

Problem  3.12  A  dual  band  radio  operates  at  900  MHz  and  1.8  GHz.  The  channel  spacing  in 
each  band  is  1  MHz.  We  wish  to  design  a  superheterodyne  receiver  with  an  IF  of  250  MHz.  The 
LO  is  built  using  a  frequency  synthesizer  that  is  tunable  from  1.9  to  2.25  GHz,  and  frequency 
divider  circuits  if  needed  (assume  that  you  can  only  implement  frequency  division  by  an  integer). 

(a)  How  would  you  design  a  superhet  receiver  to  receive  a  passband  signal  restricted  to  the  band 
1800-1801  MHz?  Specify  the  characteristics  of  the  RF  and  IF  hlters,  and  how  you  would  choose 
and  synthesize  the  LO  frequency. 

(b)  Repeat  (a)  when  the  signal  to  be  received  lies  in  the  band  900-901  MHz. 


Angle  modulation 

Problem  3.13  Figure  3.48  shows,  as  a  function  of  time,  the  phase  deviation  of  a  bandpass  FM 
signal  modulated  by  a  sinusoidal  message. 

(a)  Find  the  modulation  index  (assume  that  it  is  an  integer  multiple  of  vr  for  your  estimate). 

(b)  Find  the  message  bandwidth. 

(c)  Estimate  the  bandwidth  of  the  FM  signal  using  Carson’s  formula. 


Problem  3.14  The  input  m{t)  to  an  FM  modulator  with  kf  = 


1  has  Fourier  transform 


The  output  of  the  FM  modulator  is  given  by 

u{t)  =  Acos(27r/ct  -|- 


135 


600 


-600' - ^ ^ ^ ^ ^ ^ ^ ^ ^ - ' 

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.6  0.9  1 

Time  (milliseconds) 


Figure  3.48:  Phase  deviation  of  FM  signal  for  Problem  3.13. 


where  fc  is  the  carrier  frequency. 

(a)  Find  an  explicit  time  domain  expression  for  0(t)  and  carefully  sketch  0(t)  as  a  function  of 
time. 

(b)  Find  the  magnitude  of  the  instantaneous  frequency  deviation  from  the  carrier  at  time  t  = 

(c)  Using  the  result  from  (b)  as  an  approximation  for  the  maximum  frequency  deviation,  estimate 
the  bandwidth  of  u{t). 

Problem  3.15  Let  p{t)  =  J[_i  ij(t)  denote  a  rectangular  pulse  of  unit  duration.  Construct  the 
signal 

CX) 

m{t)  =  (-l)Xf-n) 

n=— oo 

The  signal  m{t)  is  input  to  an  FM  modulator,  whose  output  is  given  by 

u{t)  =  20  cos(27r/ct  +  4>{t)) 

where  ^ 

(j){t)  =  207r  /  m{T)dT  +  a 

J  — OO 

and  a  is  chosen  such  that  0(0)  =  0. 

(a)  Carefully  sketch  both  m{t)  and  0(t)  as  a  function  of  time. 

(b)  Approximating  the  bandwidth  of  m{t)  as  lU  ~  2,  estimate  the  bandwidth  of  u{t)  using 
Carson’s  formula. 

(c)  Suppose  that  a  very  narrow  ideal  BPF  (with  bandwidth  less  than  0.1)  is  placed  at  fc  +  a. 
For  which  (if  any)  of  the  following  choices  of  a  will  you  get  nonzero  power  at  the  output  of  the 
BPF:  (i)  a  =  .5,  (ii)  a  =  .75,  (iii)  a  =  1. 

Problem  3.16  Let  u{t)  =  20  cos(20007rt  +  0(t))  denote  an  angle  modulated  signal. 

(a)  For  0(t)  =  0.1cos27rt,  what  is  the  approximate  bandwidth  of  u? 

(b)  Let  y{t)  =  Specify  the  frequency  bands  spanned  by  y{t).  In  particular,  specify  the 

output  when  y  is  passed  through: 

(i)  A  BPF  centered  at  12KHz.  Using  Carson’s  formula,  determine  the  bandwidth  of  the  BPF 
required  to  recover  most  of  the  information  in  0  from  the  output. 

(ii)  An  ideal  LPF  of  bandwidth  200  Hz. 
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(iii)  A  BPF  of  bandwidth  100  Hz  centered  at  11  KHz. 

(c)  For  0(t)  =  2Y,n  -  2n),  where  s{t)  =  (1  - 

(i)  Sketch  the  instantaneous  frequency  deviation  from  the  carrier  frequency  of  1  KHz. 

(ii)  Show  that  we  can  write 

cos(20  007rt  +  nat) 

n 

Specify  a,  and  write  down  an  explicit  integral  expression  for  Cn- 

Problem  3.17  Consider  the  set-up  of  Problem  3.15,  taking  the  unit  of  time  in  milliseconds  for 
concreteness.  You  do  not  need  the  value  of  fc,  but  you  can  take  it  to  be  1  MHz. 

(a)  Numerically  (e.g.,  using  Matlab)  compute  the  Fourier  series  expansion  for  the  complex  enve¬ 
lope  of  the  FM  waveform,  in  the  same  manner  as  was  done  for  a  sinusoidal  message.  Report  the 
magnitudes  of  the  Fourier  series  coefficients  for  the  first  5  harmonics. 

(b)  Find  the  90%,  95%  and  99%  power  containment  bandwidths.  Compare  with  the  estimate 
from  Carson’s  formula  obtained  in  Problem  3.15(b). 

Problem  3.18  A  VCO  with  a  quiescent  frequency  of  1  GHz,  with  a  frequency  sweep  of  2 
MHz/mV  produces  an  angle  modulated  signal  whose  phase  deviation  6{t)  from  a  carrier  frequency 
fc  of  1  GHz  is  shown  in  Figure  3.49. 


Figure  3.49:  Set-up  for  Problem  3.18. 


(a)  Sketch  the  input  m{t)  to  the  VCO,  carefully  labeling  both  the  voltage  and  time  axes. 

(b)  Estimate  the  bandwidth  of  the  angle  modulated  signal  at  the  VCO  output.  You  may  ap¬ 
proximate  the  bandwidth  of  a  periodic  signal  by  that  of  its  first  harmonic. 


Uncategorized  problems 

Problem  3.19  The  signal  m{t)  =  2cos207rt  —  coslOvri,  where  the  unit  of  time  is  millisec¬ 
onds,  and  the  unit  of  amplitude  is  millivolts  (mV),  is  fed  to  a  VCO  with  quiescent  frequency 
of  5  MHz  and  frequency  deviation  of  100  KHz/mV.  Denote  the  output  of  the  VCO  by  y{t). 

(a)  Provide  an  estimate  of  the  bandwidth  of  y. 

(b)  The  signal  y{t)  is  passed  through  an  ideal  bandpass  filter  of  bandwidth  5  KHz,  centered  at 
5.005  MHz.  Describe  in  detail  how  you  would  compute  the  power  at  the  hlter  output  (if  you  can 
compute  the  power  in  closed  form,  do  so). 


Problem  3.20  Consider  the  AM  signal  Up{t)  =  [A  +  m(t))  coslOOvrt  (t  in  ms)  with  message 
signal  m{t)  as  in  Figure  3.50,  where  A  is  10  mV. 

(a)  If  the  AM  signal  is  demodulated  using  an  envelope  detector  with  an  RC  filter,  how  should 
you  choose  C  ii  R  =  500  ohms?  Try  to  ensure  that  the  hrst  harmonic  (i.e.,  the  fundamental) 
and  the  third  harmonic  of  the  message  are  reproduced  with  minimal  distortion. 

(b)  Now,  consider  an  attempt  at  synchronous  demodulation,  where  the  AM  signal  is  downcon- 
verted  using  a  201  KHz  LO,  as  shown  in  Figure  3.51,  find  and  sketch  the  I  and  Q  components. 
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m(t) 


•  • 

10  mV 

0 

1 

2 

3 

•• 

-10  mV 


t  (ms) 


Figure  3.50:  Message  signal  for  Problems  3.20  and  3.21. 


Uc(0 


u,(t) 


Figure  3.51:  Downconversion  using  201  KHz  LO  {t  in  ms  in  the  figure)  for  Problem  3.20(b)-(c). 


Uc(t)  and  Us(t),  for  0  <  f  <  2  (f  in  ms). 

(c)  Describe  how  you  would  recover  the  original  message  m{t)  from  the  downconverter  outputs 
Uc(t)  and  Us(t),  drawing  block  diagrams  as  needed. 

Problem  3.21  The  square  wave  message  signal  m{t)  in  Figure  3.50  is  input  to  a  VCO  with 
quiescent  frequency  200  KHz  and  frequency  deviation  1  KHz/mV.  Denote  the  output  of  the 
VCO  by  Up{t). 

(a)  Sketch  the  I  and  Q  components  of  the  FM  signal  (with  respect  to  a  frequency  reference  of 
200  KHz  and  a  phase  reference  chosen  such  that  the  phase  is  zero  at  time  zero)  over  the  time 
interval  0  <  t  <  2  (t  in  ms),  clearly  labeling  the  axes. 

(b)  In  order  to  extract  the  I  and  Q  components  using  a  standard  downconverter  (mix  with  LO 
and  then  lowpass  hlter),  how  would  you  choose  the  bandwidth  of  the  LPFs  used  at  the  mixer 
outputs? 


Figure  3.52:  Phase  Evolution  in  Problem  3.22. 


Problem  3.22  The  output  of  an  FM  modulator  is  the  bandpass  signal  y{t)  =  10cos(3007rf  + 
4>{t)),  where  the  unit  of  time  is  milliseconds,  and  the  phase  (j){t)  is  as  sketched  in  Figure  3.52. 


138 


(a)  Suppose  that  y{t)  is  the  output  of  a  VCO  with  frequency  deviation  1  KHz/mV  and  quiescent 
frequency  149  KHz,  find  and  sketch  the  input  to  the  VCO. 

(b)  Use  Carson’s  formula  to  estimate  the  bandwidth  of  y{t),  clearly  stating  the  approximations 
that  you  make. 


Phase  locked  loop 

Set-up  for  PLL  problems:  For  the  next  few  problems  on  PLL  modeling  and  analysis,  consider 
the  linearized  model  in  Figure  3.38,  with  the  following  notation:  loop  filter  G(s),  loop  gain  K, 
and  VCO  modeled  as  1/s.  Recall  from  your  background  on  signals  and  systems  that  a  second 
order  system  of  the  form  have  natural  frequency  Un  (in  radians/second) 

and  damping  factor 


Problem  3.23  Let  H{s)  denote  the  gain  from  the  PLL  input  to  the  output  of  the  VCO.  Let 
Hf,{s)  denote  the  gain  from  the  PLL  input  to  the  input  to  the  loop  filter.  Let  Hm{s)  denote  the 
gain  from  the  PLL  input  to  the  VCO  input. 

(a)  Write  down  the  formulas  for  H{s),  He{s),  Hm{s),  in  terms  of  K  and  G{s). 

(b)  Which  is  the  relevant  transfer  function  if  the  PLL  is  being  used  for  FM  demodulation? 

(c)  Which  is  the  relevant  transfer  function  if  the  PLL  is  being  used  for  carrier  phase  tracking? 

(d)  For  G(s)  =  ^  and  K  =  2,  write  down  expressions  for  H{s),  He{s)  and  Hm{s).  What  is  the 
natural  frequency  and  the  damping  factor? 


Problem  3.24  Suppose  the  PLL  input  exhibits  a  frequency  jump  of  1  KHz. 

(a)  How  would  you  choose  the  loop  gain  K  for  a  first  order  PLL  (G(s)  =  1)  to  ensure  a  steady 
state  error  of  at  most  5  degrees? 

(b)  How  would  you  choose  the  parameters  a  and  K  for  a  second  order  PLL  (G(s)  =  to  have 
a  natural  frequency  of  1.414  KHz  and  a  damping  factor  of  Specify  the  units  for  a  and  K. 

(c)  For  the  parameter  choices  in  (b),  find  and  roughly  sketch  the  phase  error  as  a  function  of 
time  for  a  frequency  jump  of  1  KHz. 


Problem  3.25  Suppose  that  G{s)  =  and  K  =  4. 

(a)  Find  the  transfer  function  ^44. 

^  ''  ©i(s) 

(b)  Suppose  that  the  PLL  is  used  for  FM  demodulation,  with  the  input  to  the  PLL  is  being  an 

FM  signal  with  instantaneous  frequency  deviation  of  the  FM  signal  where  the  message 

m{t)  =  2 cost  +  sin2t.  Using  the  linearized  model  for  the  PLL,  find  a  time  domain  expression 
for  the  estimated  message  provided  by  the  PLL-based  demodulator. 

Hint:  What  happens  to  a  sinusoid  of  frequency  cu  passing  through  a  linear  system  with  transfer 
function  H{s)7 


Problem  3.26  Consider  the  PLL  depicted  in  Figure  3.53,  with  input  phase  The  output 
signal  of  interest  to  us  here  is  v{t),  the  VCO  input.  The  parameter  for  the  loop  filter  G{s)  is 
given  by  a  =  lOOOvr  radians/sec. 

(a)  Assume  that  the  PLL  is  locked  at  time  0,  and  suppose  that  0(t)  =  10007rt/{t>o}.  Find  the 
limiting  value  of  v{t). 

(b)  Now,  suppose  that  0(f)  =  dvrsin  lOOOTrf.  Find  an  approximate  expression  for  v{t).  For  full 
credit,  simplify  as  much  as  possible. 

(c)  For  part  (b),  estimate  the  bandwidth  of  the  passband  signal  at  the  PLL  input. 
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Figure  3.53:  System  for  Problem  3.26. 


Quiz  on  analog  communication  systems 

Problem  3.27  Answer  the  following  questions  regarding  commercial  analog  communication  sys¬ 
tems  (some  of  which  may  no  longer  exist  in  your  neighborhood). 

(a)  (True  or  False)  The  modulation  format  for  analog  cellular  telephony  was  conventional  AM. 

(b)  (Multiple  choice)  FM  was  used  in  analog  TV  as  follows: 

(i)  to  modulate  the  video  signal 

(ii)  to  modulate  the  audio  signal 

(hi)  FM  was  not  used  in  analog  TV  systems. 

(c)  A  superheterodyne  receiver  for  AM  radio  employs  an  intermediate  frequency  (IF)  of  455  KHz, 
and  has  stations  spaced  at  10  KHz.  Comment  briefly  on  each  of  the  following  statements: 

(i)  The  AM  band  is  small  enough  that  the  problem  of  image  frequencies  does  not  occur. 

(ii)  A  bandwidth  of  20  KHz  for  the  RF  front  end  is  a  good  choice. 

(iii)  A  bandwidth  of  20  KHz  for  the  IF  hlter  is  a  good  choice. 
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Chapter  4 

Digital  Modulation 


...0110100... 


Symbol  interval 
T 


Figure  4.1:  Running  example:  Binary  antipodal  signaling  using  a  timelimited  pulse. 


Digital  modulation  is  the  process  of  translating  bits  to  analog  waveforms  that  can  be  sent  over 
a  physical  channel.  Figure  4.1  shows  an  example  of  a  baseband  digitally  modulated  waveform, 
where  bits  that  take  values  in  {0, 1}  are  mapped  to  symbols  in  {+1,  —1},  which  are  then  used 
to  modulate  translates  of  a  rectangular  pulse,  where  the  translation  corresponding  to  successive 
symbols  is  the  symbol  interval  T.  The  modulated  waveform  can  be  represented  as  a  sequence  of 
symbols  (taking  values  ±1  in  the  example)  multiplying  translates  of  a  pulse  (rectangular  in  the 
example).  This  is  an  example  of  a  widely  used  form  of  digital  modulation  termed  linear  modula¬ 
tion,  where  the  transmitted  signal  depends  linearly  on  the  symbols  to  be  sent.  Our  treatment  of 
linear  modulation  in  this  chapter  generalizes  this  example  in  several  ways.  The  modulated  signal 
in  Figure  4.1  is  a  baseband  signal,  but  what  if  we  are  constrained  to  use  a  passband  channel 
(e.g.,  a  wireless  cellular  system  operating  at  900  MHz)?  One  way  to  handle  this  to  simply  trans¬ 
late  this  baseband  waveform  to  passband  by  upconversion;  that  is,  send  Up{t)  =  u(t)  cos  27ifct, 
where  the  carrier  frequency  fc  lies  in  the  desired  frequency  band.  However,  what  if  the  frequency 
occupancy  of  the  passband  signal  is  strictly  constrained?  (Such  constraints  are  often  the  result 
of  guidelines  from  standards  or  regulatory  bodies,  and  serve  to  limit  interference  between  users 
operating  in  adjacent  channels.)  Clearly,  the  timelimited  modulation  pulse  used  in  Figure  4.1 
spreads  out  signihcantly  in  frequency.  We  must  therefore  learn  to  work  with  modulation  pulses 
which  are  better  constrained  in  frequency.  We  may  also  wish  to  send  information  on  both  the 
I  and  Q  components.  Finally,  we  may  wish  to  pack  in  more  bits  per  symbol;  for  example,  we 
could  send  2  bits  per  symbol  by  using  4  levels,  say  {±1,  ±3}. 

Chapter  plan:  In  Section  4.1,  we  develop  an  understanding  of  the  structure  of  linearly  mod¬ 
ulated  signals,  using  the  binary  modulation  in  Figure  4.1  to  lead  into  variants  of  this  example, 
corresponding  to  different  signaling  constellations  which  can  be  used  for  baseband  and  passband 
channels.  In  Section  4.2,  we  discuss  how  to  quantify  the  bandwidth  of  linearly  modulated  signals 
by  computing  the  power  spectral  density.  With  these  basic  insights  in  place,  we  turn  in  Section 
4.3  to  a  discussion  of  modulation  for  bandlimited  channels,  treating  signaling  over  baseband  and 
passband  channels  in  a  unihed  framework  using  the  complex  baseband  representation.  We  note. 
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invoking  Nyqnist’s  sampling  theorem  to  determine  the  degrees  of  freedom  offered  by  bandlimited 
channels,  that  linear  modulation  with  a  bandlimited  modulation  pulse  can  be  used  to  £11  all  of 
these  degrees  of  freedom.  We  discuss  how  to  design  bandlimited  modulation  pulses  based  on 
the  Nyquist  criterion  for  intersymbol  interference  (ISI)  avoidance.  Finally,  we  discuss  orthogonal 
and  biorthogonal  modulation  in  Section  4.4. 

Software:  Over  the  course  of  this  and  later  chapters,  we  develop  a  simulation  framework  for 
simulating  linear  modulation  over  noisy  dispersive  channels.  Software  Lab  4.1  in  this  chapter  is 
a  first  step  in  this  direction.  Appendix  4.B  provides  guidance  for  developing  the  software  for  this 

lab. 


4.1  Signal  Constellations 


Figure  4.2:  BPSK  illustrated  for  fc  =  ^  and  symbol  sequence  +1,  —1,  —1.  The  solid  line  corre¬ 
sponds  to  the  passband  signal  Up(t),  and  the  dashed  line  to  the  baseband  signal  u(t).  Note  that, 
due  to  the  change  in  sign  between  the  first  and  second  symbols,  there  is  a  phase  discontinuity  of 
TT  at  t  =  T. 


The  linearly  modulated  signal  depicted  in  Figure  4.1  can  be  written  in  the  following  general 
form: 

u(t)  =  b[n\p(t  —  riT)  (4.1) 

n 

where  is  a  sequence  of  symbols,  and  p{t)  is  the  modulating  pulse.  The  symbols  take  values 

in  {  —  1,  -1-1}  in  our  example,  and  the  modulating  pulse  is  a  rectangular  timelimited  pulse.  As  we 
proceed  along  this  chapter,  we  shall  see  that  linear  modulation  as  in  (4.1)  is  far  more  generally 
applicable,  in  terms  of  the  set  of  possible  values  taken  by  the  symbol  sequence,  as  well  as  the 
choice  of  modulating  pulse. 

The  modulated  waveform  (4.1)  is  a  baseband  waveform.  While  it  is  timelimited  in  our  example, 
and  hence  cannot  be  strictly  bandlimited,  it  is  approximately  bandlimited  to  a  band  around  DC. 
Now,  if  we  are  given  a  passband  channel  over  which  to  send  the  information  encoded  in  this 
waveform,  one  easy  approach  is  to  send  the  passband  signal 

Up{t)  =  u{t)  cos27r/cf  (4.2) 

where  fc  is  the  carrier  frequency.  That  is,  the  modulated  baseband  signal  is  sent  as  the  I 
component  of  the  passband  signal.  To  see  what  happens  to  the  passband  signal  as  a  consequence 
of  the  modulation,  we  plot  it  in  Figure  4.2.  For  the  nth  symbol  interval  nT  <  t  <  {n  +  1)T,  we 
have  Up{t)  =  cos27r/cf  if  b[n]  =  -|-1,  and  Up{t)  =  —  cos27r/cf  =  cos(27r/cf -|- vr)  if  b[n]  =  —1.  Thus, 
binary  antipodal  modulation  switches  the  phase  of  the  carrier  between  two  values  0  and  vr,  which 
is  why  it  is  termed  Binary  Phase  Shift  Keying  (BPSK)  when  applied  to  a  passband  channel: 
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We  know  from  Chapter  2  that  any  passband  signal  can  be  represented  in  terms  of  two  real- valued 
baseband  waveforms,  the  I  and  Q  components. 

Up(t)  =  Uc(t)  COS  271  fct  —  Us(t)  sin  27ifct 

The  complex  envelope  of  Up(t)  is  given  by  u(t)  =  udt)  +  jusit).  For  BPSK,  the  I  component  is 
modulated  using  binary  antipodal  signaling,  while  the  Q  component  is  not  used,  so  that  u{t)  = 
Uc{t).  However,  noting  that  the  two  signals,  udt)  cos  27i fct  and  Us{t)  sm27i fct  are  orthogonal 
regardless  of  the  choice  of  Uc  and  Ug,  we  realize  that  we  can  modulate  both  I  and  Q  components 
independently,  without  affecting  their  orthogonality.  In  this  case,  we  have 

udt)  =  ^  bc[u]p{t  -  nT),  Us{t)  =  ^  bs[n]p{t  -  nT) 

n  n 

The  complex  envelope  is  given  by 

u{t)  =  udt)  +jUs{t)  =  ^  {bc[n]  +jbs[n])p{t  -  nT)  =  ^b[n]p{t  -  nT)  (4.3) 

n  n 

where  {b[n]  =  bc[n]  +  jbs[n]}  are  complex-valued  symbols. 


Figure  4.3:  QPSK  illustrated  for  fc  =  with  symbol  sequences  {&cM}  =  and 

+1)  “!}•  The  phase  of  the  passband  signal  is  — 7r/4  in  the  hrst  symbol  interval, 
switches  to  37r/4  in  the  second,  and  to  — 37r/4  in  the  third. 


Let  us  see  what  happens  to  the  passband  signal  when  bc[n],bs[n]  each  take  values  in  {±1  ±  j}. 
For  the  nth  symbol  interval  nT  <  t  <  (n  +  1)T: 

Up{t)  =  cos  271  fct  —  sin  277 fct  =  \/2cos  {277 fct  -|-  7r/4)  ii  bc[n]  =  +l,bs[n\  =  -|-1; 

Up{t)  =  cos  277 fct  +  sin  277 fct  =  \/2cos  {277 fct  —  77 /A)  if  bc[n]  =  -|-1,  =  —1; 

Up{t)  =  —  cos  277 fct  —  sin  277 fct  =  \/2cos  {277  fct  -|-  37r/4)  if  bc[n]  =  —  1,  ^^[n]  =  -|-1; 

Up{t)  =  —  cos  277 fct  +  sin  277 fct  =  \/2cos  {277 fct  —  377/4)  if  bc[n\  =  —l,bs[n\  =  —1. 

Thus,  the  modulation  causes  the  passband  signal  to  switch  its  phase  among  four  possibilities, 
{±7r/4,  ±37r/4},  as  illustrated  in  Figure  4.3,  which  is  why  we  call  it  Quadrature  Phase  Shift 
Keying  (QPSK). 

Equivalently,  we  could  have  seen  this  from  the  complex  envelope.  Note  that  the  QPSK  symbols 
can  be  written  as  b[n]  =  \/2e^^^'^\  where  9[n\  G  {±7r/4,  ±37r/4}.  Thus,  over  the  nth  symbol,  we 
have 

Up{t)  =  Re  {^b[n]e^^^-t<^^)  =  Re  =  \/2cos  {277 fct  +  d[n]) ,  nT  <  f  <  (n  -|-  1)T 

This  indicates  that  it  is  actually  easier  to  hgure  out  what  is  happening  to  the  passband  signal 
by  working  with  the  complex  envelope.  We  therefore  work  in  the  complex  baseband  domain  for 
the  remainder  of  this  chapter. 
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In  general,  the  complex  envelope  for  a  linearly  modulated  signal  is  given  by  (4.1),  where  h[n]  = 
i>c[n]  +  jhs[n]  =  can  be  complex-valued.  We  can  view  this  as  hc[n]  modulating  the 

I  component  and  hs[n\  modulating  the  Q  component,  or  as  scaling  the  envelope  by  r[n\  and 
switching  the  phase  by  9[n\.  The  set  of  values  that  each  symbol  can  take  is  called  the  signaling 
alphabet,  or  constellation.  We  can  plot  the  constellation  in  a  two-dimensional  plot,  with  the  x- 
axis  denoting  the  real  part  hc[n]  (corresponding  to  the  I  component)  and  the  y-axis  denoting  the 
imaginary  part  bs[n]  (corresponding  to  the  Q  component).  Indeed,  this  is  why  linear  modulation 
over  passband  channels  is  also  termed  two-dimensional  modulation.  Note  that  this  provides  a 
unihed  description  of  constellations  that  can  be  used  over  both  baseband  and  passband  channels: 
for  physical  baseband  channels,  we  simply  constrain  b[n]  =  bc[n]  to  be  real- valued,  setting  bs[n]  = 
0. 


BPSK/2PAM 


QPSK/4PSK/4QAM 


4PAM 


16QAM 


Figure  4.4:  Some  commonly  used  constellations.  Note  that  2PAM  and  4PAM  can  be  used  over 
both  baseband  and  passband  channels,  while  the  two-dimensional  constellations  QPSK,  8PSK 
and  16QAM  are  for  use  over  passband  channels. 


Figure  4.4  shows  some  common  constellations.  Pulse  Amplitude  Modulation  (PAM)  corresponds 
to  using  multiple  amplitude  levels  along  the  I  component  (setting  the  Q  component  to  zero). 
This  is  often  used  for  signaling  over  physical  baseband  channels.  Using  PAM  along  both  I  and  Q 
axes  corresponds  to  Quadrature  Amplitude  Modulation  (QAM).  If  the  constellation  points  lie  on 
a  circle,  they  only  affect  the  phase  of  the  carrier:  such  signaling  schemes  are  termed  Phase  Shift 
Keying  (PSK).  When  naming  a  modulation  scheme,  we  usually  indicate  the  number  of  points 
in  the  constellations.  BPSK  and  QPSK  are  special:  BPSK  (or  2PSK)  can  also  be  classihed  as 
2PAM,  while  QPSK  (or  4PSK)  can  also  be  classihed  as  4QAM. 

Each  symbol  in  a  constellation  of  size  M  can  be  uniquely  mapped  to  log2  M  bits.  For  a  symbol 
rate  of  l/T  symbols  per  unit  time,  the  bit  rate  is  therefore  bits  per  unit  time.  Since  the 

transmitted  bits  often  contain  redundancy  due  to  a  channel  code  employed  for  error  correction  or 
detection,  the  information  rate  is  typically  smaller  than  the  bit  rate.  The  choice  of  constellation 
for  a  particular  application  depends  on  considerations  such  as  power-bandwidth  tradeoffs  and 
implementation  complexity.  We  shall  discuss  these  issues  once  we  develop  more  background. 
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4.2  Bandwidth  Occupancy 


Bandwidth  is  a  precious  commodity,  hence  it  is  important  to  quantify  the  frequency  occupancy 
of  communication  signals.  To  this  end,  consider  the  complex  envelope  of  a  linearly  modulated 
signal  (the  two-sided  bandwidth  of  this  complex  envelope  equals  the  physical  bandwidth  of  the 
corresponding  passband  signal),  which  has  the  form  given  in  (4.1):  u{t)  =  ~  nT). 

The  complex- valued  symbol  sequence  {^[n]}  is  modeled  as  random.  Modeling  the  sequence  as 
random  at  the  transmitter  makes  sense  because  the  latter  does  not  control  the  information  being 
sent  (e.g.,  it  depends  on  the  specihc  computer  hie  or  digital  audio  signal  being  sent).  Since  this 
information  is  mapped  to  the  symbols  in  some  fashion,  it  follows  that  the  symbols  themselves  are 
also  random  rather  than  deterministic.  Modeling  the  symbols  as  random  at  the  receiver  makes 
even  more  sense,  since  the  receiver  by  dehnition  does  not  know  the  symbol  sequence  (otherwise 
there  would  be  no  need  to  transmit).  However,  for  characterizing  the  bandwidth  occupancy  of  the 
digitally  modulated  signal  u,  we  do  not  compute  statistics  across  different  possible  realizations 
of  the  symbol  sequence  {^[n.]}.  Rather,  we  dehne  the  quantities  of  interest  in  terms  of  averages 
across  time,  treating  u{t)  as  a  hnite  power  signal  which  can  be  modeled  as  deterministic  once  the 
symbol  sequence  is  hxed.  (We  discuss  concepts  of  statistical  averaging  across  realizations 

later,  when  we  discuss  random  processes  in  Chapter  5.) 

We  introduce  the  concept  of  PSD  in  Section  4.2.1.  In  Section  4.2.2,  we  state  our  main  result  on 
the  PSD  of  digitally  modulated  signals,  and  discuss  how  to  compute  bandwidth  once  we  know 
the  PSD. 


4.2.1  Power  Spectral  Density 


x(t) 


H(f) 

1 

A 

f* 

Power 

Meter 


■SRD  Af 


Figure  4.5:  Operational  dehnition  of  PSD. 


We  now  introduce  the  important  concept  of  power  spectral  density  (PSD),  which  specihes  how 
the  power  in  a  signal  is  distributed  in  different  frequency  bands. 

Power  Spectral  Density:  The  power  spectral  density  (PSD),  Sx{f),  for  a  hnite-power  signal 
x(t)  is  dehned  through  the  conceptual  measurement  depicted  in  Figure  4.5.  Pass  x(t)  through 
an  ideal  narrowband  hlter  with  transfer  function 


Hf4f) 


1,  r-^<f<r  +  ^ 

0,  else 


The  PSD  evaluated  at  /*,  S^if*),  is  dehned  as  the  measured  power  at  the  hlter  output,  divided 
by  the  hlter  width  Af  (in  the  limit  as  Af  0). 

Example  (PSD  of  complex  exponentials):  Let  us  now  hnd  the  PSD  of  x{t)  = 

Since  the  frequency  content  of  x  is  concentrated  at  /o,  the  power  meter  in  Figure  4.5  will  have 
zero  output  for  f*  ^  fo  (as  Af  — )■  0,  /o  falls  outside  the  hlter  bandwidth  for  any  such  /o).  Thus, 
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Sx{f)  =  0  for  /  7^  /o-  On  the  other  hand,  for  f*  =  /o,  the  output  of  the  power  meter  is  the 
entire  power  of  x,  which  is 

Px  =  A^=  /  S,U)df 


We  conclude  that  the  PSD  is  Sx{f)  =  —  fo).  Extending  this  reasoning  to  a  sum  of  complex 

exponentials,  we  have 


PSD  of  =  Y  -  fi) 

i  i 

where  fi  are  distinct  frequencies  (positive  or  negative),  and  Aj,  6i  are  the  amplitude  and  phase, 
respectively,  of  the  ith  complex  exponential.  Thus,  for  a  real-valued  sinusoid,  we  obtain 

s,.{f)  =  \su  -  fo)  +  +  fo)  ,  tor  X(t)  =  cos(2,r/„i  +  9)  =  (4.4) 


Periodogram-based  PSD  estimation:  One  way  to  carry  out  the  conceptual  measurement  in 
Figure  4.5  is  to  limit  x{t)  to  a  hnite  observation  interval,  compute  its  Fourier  transform  and  hence 
its  energy  spectral  density  (which  is  the  magnitude  square  of  the  Fourier  transform),  and  then 
divide  by  the  length  of  the  observation  interval.  The  PSD  is  obtained  by  letting  the  observation 
interval  get  large.  Specihcally,  dehne  the  time-windowed  version  of  x  as 


a;r„(t)  =  x(f)/[_^^^](f)  (4.5) 

where  To  is  the  length  of  the  observation  interval.  Since  Tq  is  hnite  and  x{t)  has  hnite  power, 
XTo{t)  has  hnite  energy,  and  we  can  compute  its  Fourier  transform 


XTAf)=nXTj 

The  energy  spectral  density  of  xt^  is  given  by  \XT„{f)\‘^.  Averaging  this  over  the  observation 
interval,  we  obtain  the  estimated  PSD 


Sxif) 


To 


(4.6) 


The  estimate  in  (4.6),  which  is  termed  a  periodogram,  can  typically  be  obtained  by  taking  the 
DFT  of  a  sampled  version  of  the  time  windowed  signal;  the  time  interval  To  must  be  large  enough 
to  give  the  desired  frequency  resolution,  while  the  sampling  rate  must  be  large  enough  to  capture 
the  variations  in  x{t).  The  estimated  PSDs  obtained  over  multiple  observation  intervals  can  then 
be  averaged  further  to  get  smoother  estimates. 

Formally,  we  can  dehne  the  PSD  in  the  limit  of  large  time  windows  as  follows: 

S.U)  =  /m  (4.7) 

To^oo  io 


Units  for  PSD:  Power  per  unit  frequency  has  the  same  units  as  power  multiplied  by  time,  or 
energy.  Thus,  the  PSD  is  expressed  in  units  of  Watts/Hertz,  or  Joules. 


Power  in  terms  of  PSD:  The  power  of  a  hnite  power  signal  x  is  given  by  integrating  its 
PSD: 


Px  = 


Sx{f)df 


(4.8) 
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4.2.2  PSD  of  a  linearly  modulated  signal 


We  are  now  ready  to  state  our  result  on  the  PSD  of  a  linearly  modulated  signal  u{t)  =  b[n]p(t— 
nT).  While  we  derive  a  more  general  result  in  Appendix  4. A,  our  result  here  applies  to  the  fol¬ 
lowing  important  special  case: 

(a)  the  symbols  have  zero  DC  value:  liniAr^oo  2F+T  ~ 

(b)  the  symbols  are  uncorrelated:  liniAr^oo  ^  —  /c]  =  0  for  A:  7^  0. 


Theorem  4.2.1  (PSD  of  a  linearly  modulated  signal)  Consider  a  linearly  modulated  signal 

At)  =  E  b[n]p{t  —  nT) 


where  the  symbol  sequence  is  zero  mean  and  uncorrelated  with  average  symbol  energy 


N 


nll^  =  a? 


n=—N 


Then  the  PSD  is  given  by 

and  the  power  of  the  modulated  signal  is 


Su{f)—  rjn 


T 


P,= 


(4.9) 


(4.10) 


where  ||p|p  denotes  the  energy  of  the  modulating  pulse. 


See  Appendix  4. A  for  a  proof  of  (4.9),  which  follows  from  specializing  a  more  general  expression. 
The  expression  for  power  follows  from  integrating  the  PSD; 


Pu  = 


Su{f)df  =  ^ 


\PifWdf  =  f  /  \p{t)fdt  = 


^!\\P\\ 


where  we  have  used  Parseval’s  identity. 

An  intuitive  interpretation  of  this  theorem  is  as  follows.  Every  T  time  units,  we  send  a  pulse 
of  the  form  b[n]p(t  —  nT)  with  average  energy  spectral  density  (T^|P(/)P,  so  that  the  PSD  is 
obtained  by  dividing  this  by  T.  The  same  reasoning  applies  to  the  expression  for  power:  every 
T  time  units,  we  send  a  pulse  b[n]p(t  —  nT)  with  average  energy  cr^||p|p,  so  that  the  power  is 
obtained  by  dividing  by  T.  The  preceding  intuition  does  not  apply  when  successive  symbols  are 
correlated,  in  which  case  we  get  the  more  complicated  expression  (4.32)  for  the  PSD  in  Appendix 
4.A. 

Once  we  know  the  PSD,  we  can  dehne  the  bandwidth  of  m  in  a  number  of  ways. 

3  dB  bandwidth:  For  symmetric  «S'„(/)  with  a  maximum  at  /  =  0,  the  3  dB  bandwidth 
is  dehned  by  Su{B^fiB/‘^)  =  Sui—B^fiB/‘^)  =  |*S'„(0).  That  is,  the  3  dB  bandwidth  is  the  size 
of  the  interval  between  the  points  at  which  the  PSD  is  3  dB,  or  a  factor  of  |,  smaller  than  its 
maximum  value. 

Fractional  power  containment  bandwidth.  This  is  the  size  of  the  smallest  interval  that 
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contains  a  given  fraction  of  the  power.  For  example,  for  symmetric  Su{f),  the  99%  fractional 
power  containment  bandwidth  B  is  dehned  by 

/B/2  poo 

Suif)df  =  0.99P,  =  0.99  /  SMdf 

-B/2  J —oo 

(replace  0.99  in  the  preceding  equation  by  any  desired  fraction  7  to  get  the  corresponding  7 
power  containment  bandwidth). 

Time/frequency  normalization:  Before  we  discuss  examples  in  detail,  let  us  simplify  our 
life  by  making  a  simple  observation  on  time  and  frequency  scaling.  Suppose  we  have  a  linearly 
modulated  system  operating  at  a  symbol  rate  of  1/T,  as  in  (4.1).  We  can  think  of  it  as  a 
normalized  system  operating  at  a  symbol  rate  of  one,  where  the  unit  of  time  is  T.  This  implies 
that  the  unit  of  frequency  is  1/T.  In  terms  of  these  new  units,  we  can  write  the  linearly  modulated 
signal  as 

“i(^)  =  y^,b[n]pi{t  -  n) 

n 

where  Pi{t)  is  the  modulation  pulse  for  the  normalized  system.  For  example,  for  a  rectangular 
pulse  timelimited  to  the  symbol  interval,  we  have  pi{t)  =  /[o,i](t).  Suppose  now  that  the  band¬ 
width  of  the  normalized  system  (computed  using  any  dehnition  that  we  please)  is  Bi.  Since 
the  unit  of  frequency  is  1/T,  the  bandwidth  in  the  original  system  is  Bi/T.  Thus,  in  terms  of 
determining  frequency  occupancy,  we  can  work,  without  loss  of  generality,  with  the  normalized 
system.  In  the  original  system,  what  we  are  really  doing  is  working  with  the  normalized  time 
t/T  and  the  normalized  frequency  /T. 


rr 

Figure  4.6:  PSD  corresponding  to  rectangular  and  sine  timelimited  pulses.  The  main  lobe  of  the 
PSD  is  broader  for  the  sine  pulse,  but  its  99%  power  containment  bandwidth  is  much  smaller. 


Rectangular  pulse:  Without  loss  of  generality,  consider  a  normalized  system  with  pi{t)  = 
/[07](t),  for  which  Pi{f)  =  smc{f)e~P-^ .  For  {&[n]}  i.i.d.,  taking  values  ±1  with  equal  probability, 
we  have  =  1.  Applying  (4.9),  we  obtain 

Smif)  =  (rlsinc'^if)  (4.11) 

Integrating,  or  applying  (4.10),  we  obtain  Pu  =  The  scale  factor  of  af  is  not  important,  since 
it  drops  out  for  any  dehnition  of  bandwidth.  We  therefore  set  it  to  =  1.  The  PSD  for  the 
rectangular  pulse,  along  with  that  for  a  sine  pulse  introduced  shortly,  is  plotted  in  Figure  4.6. 
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Note  that  the  PSD  for  the  rectangular  pulse  has  much  fatter  tails,  which  does  not  bode  well  for 
its  bandwidth  efficiency.  For  fractional  power  containment  bandwidth  with  fraction  7,  we  have 
the  equation 

/Bil2  poo  pi 

sinc^/d/  =  7  /  sinc^/d/  =  7  /  l^dt  =  7 
■Bi/2  J-OQ  Jo 

using  Parseval’s  identity.  We  therefore  obtain,  using  the  symmetry  of  the  PSD,  that  the  band¬ 
width  is  the  numerical  solution  to  the  equation 

B1I2 

sine‘s  fdf  =  7/2 

For  example,  for  7  =  0.99,  we  obtain  Bi  =  10.2,  while  for  7  =  0.9,  we  obtain  Bi  =  0.85. 
Thus,  if  we  wish  to  be  strict  about  power  containment  (e.g.,  in  order  to  limit  adjacent  channel 
interference  in  wireless  systems),  the  rectangular  timelimited  pulse  is  a  very  poor  choice.  On  the 
other  hand,  in  systems  where  interference  or  regulation  are  not  signiheant  issues  (e.g.,  low-cost 
wired  systems),  this  pulse  may  be  a  good  choice  because  of  its  ease  of  implementation  using 
digital  logic. 


(4.12) 


Example  4.2.1  (Bandwidth  computation):  A  passband  system  operating  at  a  carrier  fre¬ 
quency  of  2.4  GHz  at  a  bit  rate  of  20  Mbps.  A  rectangular  modulation  pulse  timelimited  to  the 
symbol  interval  is  employed. 

(a)  Find  the  99%  and  90%  power  containment  bandwidths  if  the  constellation  used  is  16-QAM. 

(b)  Find  the  99%  and  90%  power  containment  bandwidths  if  the  constellation  used  is  QPSK. 
Solution: 

(a)  The  16-QAM  system  sends  4  bits/symbol,  so  that  the  symbol  rate  1/T  equals  =  5 

Msymbols/sec.  Since  the  99%  power  containment  bandwidth  for  the  normalized  system  is 
Bi  =  10.2,  the  required  bandwidth  is  Bi/T  =  51  MHz.  Since  the  90%  power  containment 
for  the  normalized  system  is  Bi  =  0.85,  the  required  bandwidth  Bi/T  equals  4.25  MHz. 

(b)  The  QPSK  system  sends  2  bits/symbol,  so  that  the  symbol  rate  is  10  Msymbols/sec.  The 
bandwidths  required  are  therefore  double  those  in  (a):  the  99%  power  containment  bandwidth 
is  102  MHz,  while  the  90%  power  containment  bandwidth  is  8.5  MHz. 

Clearly,  when  the  criterion  for  dehning  bandwidth  is  the  same,  then  16-QAM  consumes  half  the 
bandwidth  compared  to  QPSK  for  a  hxed  bit  rate.  However,  it  is  interesting  to  note  that,  for 
the  rectangular  timelimited  pulse,  a  QPSK  system  where  we  are  sloppy  about  power  leakage 
(90%  power  containment  bandwidth  of  8.5  MHz)  can  require  far  less  bandwidth  than  a  system 
using  a  more  bandwidth-efficient  16-QAM  constellation  where  we  are  strict  about  power  leakage 
(99%  power  containment  bandwidth  of  51  MHz).  This  extreme  variation  of  bandwidth  when  we 
tweak  dehnitions  slightly  is  because  of  the  poor  frequency  domain  containment  of  the  rectangular 
timelimited  pulse.  Thus,  if  we  are  serious  about  limiting  frequency  occupancy,  we  need  to  think 
about  more  sophisticated  designs  for  the  modulation  pulse. 


Smoothing  out  the  rectangular  pulse:  A  useful  alternative  to  using  the  rectangular  pulse, 
while  still  keeping  the  modulating  pulse  timelimited  to  a  symbol  interval,  is  the  sine  pulse,  which 
for  the  normalized  system  equals 


Pi{t)  =  \/2sin(7rt)  I[o,i]{t) 

Since  the  sine  pulse  does  not  have  the  sharp  edges  of  the  rectangular  pulse  in  the  time  domain, 
we  expect  it  to  be  more  compact  in  the  frequency  domain.  Note  that  we  have  normalized  the 
pulse  to  have  unit  energy,  as  we  did  for  the  normalized  rectangular  pulse.  This  implies  that  the 
power  of  the  modulated  signal  is  the  same  in  the  two  cases,  so  that  we  can  compare  PSDs  under 
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the  constraint  that  the  area  under  the  PSDs  remains  constant.  Setting  =  1  and  using  (4.9), 
we  obtain  (see  Problem  4.1): 


Su,{f) 


8  cos^  tt/ 

^(1- 4/2)2 


(4.13) 


Proceeding  as  we  did  for  obtaining  (4.12),  the  fractional  power  containment  bandwidth  for  frac¬ 
tion  7  is  given  by  the  formula: 


TT^ 


COS^  tt/ 

(1- 4/2)2 


df  =  7/2 


(4.14) 


For  7  =  0.99,  we  obtain  Bi  =  1.2,  which  is  an  order  of  magnitude  improvement  over  the 
corresponding  value  of  Bi  =  10.2  for  the  rectangular  pulse. 

While  the  sine  pulse  has  better  frequency  domain  containment  than  the  rectangular  pulse,  it  is 
still  not  suitable  for  strictly  bandlimited  channels.  We  discuss  pulse  design  for  such  channels 
next. 


4.3  Design  for  Bandlimited  Channels 

Suppose  that  you  are  told  to  design  your  digital  communication  system  so  that  the  transmitted 
signal  hts  between  2.39  and  2.41  GHz;  that  is,  you  are  given  a  passband  channel  of  bandwidth  20 
MHz  at  a  carrier  frequency  of  2.4  GHz.  Any  signal  that  you  transmit  over  this  band  has  a  complex 
envelope  with  respect  to  2.4  GHz  that  occupies  a  band  from  -10  MHz  to  10  MHz.  Similarly,  the 
passband  channel  (modeled  as  an  LTI  system)  has  an  impulse  response  whose  complex  envelope  is 
bandlimited  from  -10  MHz  to  10  MHz.  In  general,  for  a  passband  channel  or  signal  of  bandwidth 
W,  with  an  appropriate  choice  of  reference  frequency,  we  have  a  corresponding  complex  baseband 
signal  spanning  the  band  [— hF/2,  hF/2].  Thus,  we  restrict  our  design  to  the  complex  baseband 
domain,  with  the  understanding  that  the  designs  can  be  translated  to  passband  channels  by 
upconversion  of  the  I  and  Q  components  at  the  transmitter,  and  downconversion  at  the  receiver. 
Also,  note  that  the  designs  specialize  to  physical  baseband  channels  if  we  restrict  the  baseband 
signals  to  be  real-valued. 


4.3.1  Nyquist’s  Sampling  Theorem  and  the  Sine  Pulse 

Our  first  step  in  understanding  communication  system  design  for  such  a  bandlimited  channel  is 
to  understand  the  structure  of  bandlimited  signals.  To  this  end,  suppose  that  the  signal  s{t)  is 
bandlimited  to  [— hF/2,  W/2].  We  can  now  invoke  Nyquist’s  sampling  theorem  (proof  postponed 
to  Section  4.5)  to  express  the  signal  in  terms  of  its  samples  at  rate  W . 

Theorem  4.3.1  (Nyquist’s  sampling  theorem)  Any  signal  s{t)  bandlimited  to  [— ^,  can 
be  described  completely  by  its  samples  {s('^)}  at  rate  W .  The  signal  s{t)  can  be  recovered  from 
its  samples  using  the  following  interpolation  formula: 


where  p{t)  =  sinc(hFf). 
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Degrees  of  freedom:  What  does  the  sampling  theorem  tell  us  about  digital  modulation?  The 
interpolation  formula  (4.15)  tells  us  that  we  can  interpret  s{t)  as  a  linearly  modulated  signal 
with  symbol  sequence  equal  to  the  samples  {s(n/hh)},  symbol  rate  1/T  equal  to  the  bandwidth 
W,  and  modulation  pulse  given  by  p(t)  =  sinc(iyt)  -H-  P{f)  =  ^Ii-w/2,w/2]{f)-  Thus,  linear 
modulation  with  the  sine  pulse  is  able  to  exploit  all  the  “degrees  of  freedom”  available  in  a 
bandlimited  channel. 

Signal  space:  If  we  signal  over  an  observation  interval  of  length  Tg  using  linear  modulation 
according  to  the  interpolation  formula  (4.15),  then  we  have  approximately  WT^,  complex- valued 
samples.  Thus,  while  the  signals  we  send  are  continuous-time  signals,  which  in  general,  lie  in  an 
inhnite-dimensional  space,  the  set  of  possible  signals  we  can  send  in  a  hnite  observation  interval 
of  length  To  live  in  a  complex-valued  vector  space  of  finite  dimension  WTo,  or  equivalently,  a 
real-valued  vector  space  of  dimension  2WTo.  Such  geometric  views  of  communication  signals  as 
vectors,  often  termed  signal  space  concepts,  are  particularly  useful  in  design  and  analysis,  as  we 
explore  in  more  detail  in  Chapter  6. 


Figure  4.7:  Three  successive  sine  pulses  (each  pulse  is  truncated  to  a  length  of  10  symbol  intervals 
on  each  side)  modulated  by  The  actual  transmitted  signal  is  the  sum  of  these  pulses 

(not  shown).  Note  that,  while  the  pulses  overlap,  the  samples  at  t  =  0,T,  2T  are  equal  to  the 
transmitted  bits  because  only  one  pulse  is  nonzero  at  these  times. 


The  concept  of  Nyquist  signaling:  Since  the  sine  pulse  is  not  timelimited  to  a  symbol  interval, 
in  principle,  the  symbols  could  interfere  with  each  other.  The  time  domain  signal  corresponding 
to  a  bandlimited  modulation  pulse  such  as  the  sine  spans  an  interval  signihcantly  larger  than  the 
symbol  interval  (in  theory,  the  interval  is  inhnitely  large,  but  we  always  truncate  the  waveform 
in  implementations).  This  means  that  successive  pulses  corresponding  to  successive  symbols 
which  are  spaced  by  the  symbol  interval  (i.e.,  h[n]p{t  —  nT)  as  we  increment  n)  overlap  with, 
and  therefore  can  interfere  with,  each  other.  Figure  4.7  shows  the  sine  pulse  modulated  by  three 
bits,  While  the  pulses  corresponding  to  the  three  symbols  do  overlap,  notice  that,  by 

sampling  at  t  =  0,  t  =  T  and  t  =  2T,  we  can  recover  the  three  symbols  because  exactly  one  of  the 
pulses  is  nonzero  at  each  of  these  times.  That  is,  at  sampling  times  spaced  by  integer  multiples  of 
the  symbol  time  T,  there  is  no  intersymbol  interference.  We  call  such  a  pulse  Nyquist  for  signaling 
at  rate  4,^  and  we  discuss  other  examples  of  such  pulses  soon.  Designing  pulses  based  on  the 
Nyquist  criterion  allows  us  the  freedom  to  expand  the  modulation  pulses  in  time  beyond  the 
symbol  interval  (thus  enabling  better  containment  in  the  frequency  domain),  while  ensuring  that 
there  is  no  ISI  at  appropriately  chosen  sampling  times  despite  the  signiheant  overlap  between 
successive  pulses. 
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Figure  4.8:  The  baseband  signal  for  10  BPSK  symbols  of  alternating  signs,  modulated  using  the 
sine  pulse.  The  hrst  symbol  is  +1,  and  the  sample  at  time  t  =  0,  marked  with  ’x’,  equals  +1,  as 
desired  (no  ISI).  However,  if  the  sampling  time  is  off  by  0.25T,  the  sample  value,  marked  by  ’+’, 
becomes  much  smaller  because  of  ISI.  While  it  still  has  the  right  sign,  the  ISI  causes  it  to  have 
signihcantly  smaller  noise  immunity.  See  Problem  4.14  for  an  example  in  which  the  ISI  due  to 
timing  mismatch  actually  causes  the  sign  to  flip. 
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The  problem  with  sine:  Are  we  done  then?  Should  we  just  use  linear  modulation  with  a  sine 
pulse  when  confronted  with  a  bandlimited  channel?  Unfortunately,  the  answer  is  no:  just  as  the 
rectangular  timelimited  pulse  decays  too  slowly  in  frequency,  the  rectangular  bandlimited  pulse, 
corresponding  to  the  sine  pulse  in  the  time  domain,  decays  too  slowly  in  time.  Let  us  see  what 
happens  as  a  consequence.  Figure  4.8  shows  a  plot  of  the  modulated  waveform  for  a  bit  sequence 
of  alternating  sign.  At  the  correct  sampling  times,  there  is  no  ISI.  However,  if  we  consider  a  small 
timing  error  of  0.25T,  the  ISI  causes  the  sample  value  to  drop  drastically,  making  the  system 
more  vulnerable  to  noise.  What  is  happening  is  that,  when  there  is  a  small  sampling  offset, 
we  can  make  the  ISI  add  up  to  a  large  value  by  choosing  the  interfering  symbols  so  that  their 
contributions  all  have  signs  opposite  to  that  of  the  desired  symbol  at  the  sampling  time.  Since 
the  sine  pulse  decays  as  1/f,  the  ISI  created  for  a  given  symbol  by  an  interfering  symbol  which 
is  n  symbol  intervals  away  decays  as  1/n,  so  that,  in  the  worst-case,  the  contributions  from  the 
interfering  symbols  roughly  have  the  form  'Yhn  n’  ^  series  that  is  known  to  diverge.  Thus,  in 
theory,  if  we  do  not  truncate  the  sine  pulse,  we  can  make  the  ISI  arbitrarily  large  when  there  is 
a  small  timing  offset.  In  practice,  we  do  truncate  the  modulation  pulse,  so  that  we  only  see  ISI 
from  a  hnite  number  of  symbols.  However,  even  when  we  do  truncate,  as  we  see  from  Figure  4.8, 
the  slow  decay  of  the  sine  pulse  means  that  the  ISI  adds  up  quickly,  and  signihcantly  reduces 
the  margin  of  error  when  noise  is  introduced  into  the  system. 

While  the  sine  pulse  may  not  be  a  good  idea  in  practice,  the  idea  of  using  bandwidth-efficient 
Nyquist  pulses  is  a  good  one,  and  we  now  develop  it  further. 


4.3.2  Nyquist  Criterion  for  ISI  Avoidance 

Nyquist  signaling:  Consider  a  linearly  modulated  signal 

n{t)  =  5:  h[n]p{t  —  nT) 

n 

We  say  that  the  pulse  p{t)  is  Nyquist  (or  satishes  the  Nyquist  criterion)  for  signaling  at  rate  ^ 
if  the  symbol-spaced  samples  of  the  modulated  signal  are  equal  to  the  symbols  (or  a  hxed  scalar 
multiple  of  the  symbols);  that  is,  u{kT)  =  b[k]  for  all  k.  That  is,  there  is  no  ISI  at  appropriately 
chosen  sampling  times  spaced  by  the  symbol  interval. 

In  the  time  domain,  it  is  quite  easy  to  see  what  is  required  to  satisfy  the  Nyquist  criterion.  The 
samples  u{kT)  =  Xln  ~  ~  ^  scalar  multiple  of  h[k])  for  all  k  if  and  only 

if  p(0)  =  1  (or  some  nonzero  constant)  and  p{mT)  =  0  for  all  integers  m  ^  0.  However,  for 
design  of  bandwidth  efficient  pulses,  it  is  important  to  characterize  the  Nyquist  criterion  in  the 
frequency  domain.  This  is  given  by  the  following  theorem. 

Theorem  4.3.2  (Nyquist  criterion  for  ISI  avoidance):  The  pulse  p{t)  -H-  P{f)  is  Nyquist 
for  signaling  at  rate  ^  if 

p(mT)  =  =  I  J  2^1  (4.16) 

or  equivalently, 

1  °°  k 

T  E  +  y)  =  1  for  all  /  (4.17) 

k=—oo 

The  proof  of  this  theorem  is  given  in  Section  4.5,  where  we  show  that  both  the  Nyquist  sampling 
theorem.  Theorem  4.3.1,  and  the  preceding  theorem  are  based  on  the  same  mathematical  result, 
that  the  samples  of  a  time  domain  signal  have  a  one-to-one  mapping  with  the  sum  of  translated 
(or  aliased)  versions  of  its  Fourier  transform. 
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In  this  section,  we  explore  the  design  implications  of  Theorem  4.3.2.  In  the  frequency  domain, 
the  translates  of  P{f)  by  integer  multiples  of  1/T  must  add  up  to  a  constant.  As  illustrated  by 
Figure  4.9,  the  minimum  bandwidth  pulse  for  which  this  happens  is  the  ideal  bandlimited  pulse 
over  an  interval  of  length  1/T. 

Not  Nyquist  Nyquist  with  minimum  bandwidth 

P(f+1/T)  P(f)  P(f-1/T)  P(f+1/T)  P(f)  P(f-1/T) 


1/T  1/T 

Figure  4.9:  The  minimum  bandwidth  Nyquist  pulse  is  a  sine. 


Minimum  bandwidth  Nyquist  pulse:  The  minimum  bandwidth  Nyquist  pulse  is 

\  0,  else 

corresponding  to  the  time  domain  pulse 

p{t)  =  sinc(f/T) 

As  we  have  already  discussed,  the  sine  pulse  is  not  a  good  choice  in  practice  because  of  its  slow 
decay  in  time.  To  speed  up  the  decay  in  time,  we  must  expand  in  the  frequency  domain,  while 
conforming  to  the  Nyquist  criterion.  The  trapezoidal  pulse  depicted  in  Figure  4.9  is  an  example 
of  such  a  pulse. 


Figure  4.10:  A  trapezoidal  pulse  which  is  Nyquist  at  rate  1/T.  The  (fractional)  excess  bandwidth 

is  a. 


The  role  of  excess  bandwidth:  We  have  noted  earlier  that  the  problem  with  the  sine  pulse 
arises  because  of  its  1/t  decay  and  the  divergence  of  the  harmonic  series  n’  which  implies 

that  the  worst-case  contribution  from  “distant”  interfering  symbols  at  a  given  sampling  instant 
can  blow  up.  Using  the  same  reasoning,  however,  a  pulse  p{t)  decaying  as  1/t^  for  &  >  1  should 
work,  since  the  series  ^  does  converge  for  6  >  1.  A  faster  time  decay  requires  a  slower 

decay  in  frequency.  Thus,  we  need  excess  bandwidth,  beyond  the  minimum  bandwidth  dictated 
by  the  Nyquist  criterion,  to  £x  the  problems  associated  with  the  sine  pulse.  The  (fractional) 
excess  bandwidth  for  a  linear  modulation  scheme  is  dehned  to  be  the  fraction  of  bandwidth 
over  the  minimum  required  for  ISI  avoidance  at  a  given  symbol  rate.  In  particular.  Figure  4.10 
shows  that  a  trapezoidal  pulse  (in  the  frequency  domain)  can  be  Nyquist  for  suitably  chosen 
parameters,  since  the  translates  {P{f  +  k/T)}  as  shown  in  the  hgure  add  up  to  a  constant.  Since 
trapezoidal  P{f)  is  the  convolution  of  two  boxes  in  the  frequency  domain,  the  time  domain  pulse 
pit)  is  the  product  of  two  sine  functions,  as  worked  out  in  the  example  below.  Since  each  sine 
decays  as  1/t,  the  product  decays  as  1/t^,  which  implies  that  the  worst-case  ISI  with  timing 
mismatch  is  indeed  bounded. 
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Example  4.3.1  Consider  the  trapezoidal  pulse  of  excess  bandwidth  a  shown  in  Figure  4.10. 

(a)  Find  an  explicit  expression  for  the  time  domain  pulse  p{t). 

(b)  What  is  the  bandwidth  required  for  a  passband  system  using  this  pulse  operating  at  120 
Mbps  using  64QAM,  with  an  excess  bandwidth  of  25%? 

Solution:  (a)  It  is  easy  to  check  that  the  trapezoid  is  a  convolution  of  two  boxes  as  follows  (we 
assume  0  <  a  <  1): 


Pif) 


Taking  inverse  Fourier  transforms,  we  obtain 


p{t)  =  —  sinc(t/T)^  sinc(at/T) j  =  sinc(f/T)sinc(af/T)  (4-18) 


The  presence  of  the  hrst  sine  provides  the  zeroes  required  by  the  time  domain  Nyquist  criterion: 
p{mT)  =  0  for  nonzero  integers  m  ^  0.  The  presence  of  a  second  sine  yields  a  1/P  decay, 
providing  robustness  against  timing  mismatch. 

(b)  Since  64  =  2®,  the  use  of  64QAM  corresponding  to  sending  6  bits/symbol,  so  that  the  symbol 
rate  is  120/6  =  20  Msymbols/sec.  The  minimum  bandwidth  required  is  therefore  20  MHz,  so 
that  25%  excess  bandwidth  corresponds  to  a  bandwidth  of  20  x  1.25  =  25  MHz. 


Raised  cosine  pulse:  Replacing  the  straight  line  of  the  trapezoid  with  a  smoother  cosine¬ 
shaped  curve  in  the  frequency  domain  gives  us  the  raised  cosine  pulse  shown  in  Figure  4.12, 
which  has  a  faster,  1/P,  decay  in  the  time  domain. 


p{f) 


T,  I/I  < 

i:ii  +  cos((|/|-Js!)Hi)],  :y!<|/|<J^ 

0,  I/I  >  W 


where  a  is  the  fractional  excess  bandwidth,  typically  chosen  in  the  range  where  0  <  a  <  1.  As 
shown  in  Problem  4.11,  the  time  domain  pulse  s{t)  is  given  by 


p{t)  =  sinc(-) 


cos  7ra|; 
1-  (^) 


2 


This  pulse  inherits  the  Nyquist  property  of  the  sine  pulse,  while  having  an  additional  multiplica¬ 
tive  factor  that  gives  an  overall  1/P  decay  with  time.  The  faster  time  decay  compared  to  the 
sine  pulse  is  evident  from  a  comparison  of  Figures  4.12(b)  and  4.11(b). 


4.3.3  Bandwidth  efficiency 


We  dehne  the  bandwidth  efficiency  of  linear  modulation  with  an  M-ary  alphabet  as 

Pb  =  log2  M  bits/symbol 


The  Nyquist  criterion  for  ISI  avoidance  says  that  the  minimum  bandwidth  required  for  ISI-free 
transmission  using  linear  modulation  equals  the  symbol  rate,  using  the  sine  as  the  modulation 
pulse.  For  such  an  idealized  system,  we  can  think  of  ps  as  bits/second  per  Hertz,  since  the  symbol 
rate  equals  the  bandwidth.  Thus,  knowing  the  bit  rate  Rf,  and  the  bandwidth  efficiency  Pb  of 
the  modulation  scheme,  we  can  determine  the  symbol  rate,  and  hence  the  minimum  required 
bandwidth  Bmin.  as  follows: 


Rb 


Vb 
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(a)  Frequency  domain  boxcar 


(b)  Time  domain  sine  pulse 


Figure  4.11:  Sine  pulse  for  minimum  bandwidth  ISI-free  signaling  at  rate  1/T.  Both  time  and 
frequency  axes  are  normalized  to  be  dimensionless. 


X(f) 


T 

\ 

A 

-(l+a)/2  -1/2  -(l-a)/2 

0  (l-a)/2  1/2  (l+a)/2 

(a)  Frequency  domain  raised  cosine 


(b)  Time  domain  pulse  (excess  bandwidth  a  =  0.5) 


Figure  4.12:  Raised  cosine  pulse  for  minimum  bandwidth  ISI-free  signaling  at  rate  1/T,  with 
excess  bandwidth  a.  Both  time  and  frequency  axes  are  normalized  to  be  dimensionless. 
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This  bandwidth  would  then  be  expanded  by  the  excess  bandwidth  used  in  the  modulating  pulse. 
However,  this  is  not  included  in  our  definition  of  bandwidth  efficiency,  because  excess  bandwidth 
is  a  highly  variable  quantity  dictated  by  a  variety  of  implementation  considerations.  Once  we 
decide  on  the  fractional  excess  bandwidth  a,  the  actual  bandwidth  required  is 


B  (It 


(1  +  a) 


Rb 

Vb 


4.3.4  Power- bandwidth  tradeoffs:  a  sneak  preview 

Clearly,  we  can  increase  bandwidth  efficiency  simply  by  increasing  M,  the  constellation  size. 
For  example,  the  bandwidth  efficiency  of  QPSK  is  2  bits/symbol,  while  that  of  16QAM  is  4 
bits/symbol.  What  stops  us  from  increasing  constellation  size,  and  hence  bandwidth  efficiency, 
indefinitely  is  noise,  and  the  fact  that  we  cannot  use  arbitrarily  large  transmit  power  (typically 
limited  by  cost  or  physical  and  regulatory  constraints)  to  overcome  it.  Noise  in  digital  communi¬ 
cation  systems  must  be  modeled  statistically,  hence  rigorous  discussion  of  a  formal  model  and  its 
design  consequences  is  postponed  to  Chapters  5  and  6.  However,  that  does  not  prevent  us  from 
giving  a  handwaving  sneak  preview  of  the  bottomline  here.  Note  that  this  subsection  is  meant 
as  a  teaser:  it  can  be  safely  skipped,  since  these  issues  are  covered  in  detail  in  Chapter  6. 


Figure  4.13:  Scaling  of  minimum  distance  and  energy  per  symbol. 


Intuitively  speaking,  the  effect  of  noise  is  to  perturb  constellation  points  from  the  nominal  loca¬ 
tions  shown  in  Figure  4.4,  which  leads  to  the  possibility  of  making  an  error  in  deciding  which 
point  was  transmitted.  For  a  given  noise  “strength”  (which  determines  how  much  movement  the 
noise  can  produce),  the  closer  the  constellation  points,  the  more  the  possibility  of  such  errors. 
In  particular,  as  we  shall  see  in  Chapter  6,  the  minimum  distance  between  constellation  points, 
termed  dmin,  provide  a  good  measure  of  how  vulnerable  we  are  to  noise.  For  a  given  constellation 
shape,  we  can  increase  dmin  simply  by  scaling  up  the  constellation,  as  shown  in  Figure  4.13,  but 
this  comes  with  a  corresponding  increase  in  energy  expenditure.  To  quantify  this,  define  the 
energy  per  symbol  Eg  for  a  constellation  as  the  average  of  the  squared  Euclidean  distances  of  the 
points  from  the  origin.  For  an  M-ary  constellation,  each  symbol  carries  log2  M  bits  of  informa¬ 
tion,  and  we  can  define  the  average  energy  per  bit  Eb  as  Eb  =  •  Specifically,  dmin  increases 

from  2  to  4  by  scaling  as  shown  in  Figure  4.13.  Correspondingly,  Eg  =  2  and  E;,  =  1  is  increased 
to  Eg  =  8  and  Eb  =  4:  in  Figure  4.13(b).  Thus,  doubling  the  minimum  distance  in  Figure  4.13 

d? 

leads  to  a  four-fold  increase  in  Eg  and  Eb-  However,  the  quantity  does  not  change  due  to 
scaling;  it  depends  only  on  the  relative  geometry  of  the  constellation  points.  We  therefore  adopt 
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this  scale-invariant  measure  as  our  notion  of  power  efficiency  for  a  constellation: 

VP  =  ^  (4.19) 

Since  this  quantity  is  scale-invariant,  we  can  choose  any  convenient  scaling  in  computing  it:  for 
QPSK,  choosing  the  scaling  on  the  left  in  Figure  4.13,  we  have  dmin  =  ‘2,  Eg  =  2,  Eb  =  1,  which 
gives  rjp  =  4. 

It  is  important  to  understand  how  these  quantities  relate  to  physical  link  parameters.  For  a 
given  bit  rate  Rb  and  received  power  Prx,  the  energy  per  bit  is  given  by  Eb  =  H  is  worth 
verifying  that  the  units  make  sense:  the  numerator  has  units  of  Watts,  or  Joules/sec,  while  the 
denominator  has  units  of  bits/sec,  so  that  Eb  has  units  of  joules/bit.  We  shall  see  in  Chapter  6 
that  the  reliability  of  communication  is  determined  by  the  power  efficiency  rjp  {a.  scale-invariant 
quantity  which  is  a  function  of  the  constellation  shape)  and  the  dimensionless  signal-to-noise  ratio 
(SNR)  measure  Eb/No,  where  Nq  is  the  noise  power  spectral  density,  which  has  units  of  watts/Hz, 
or  Joules.  Specihcally,  the  reliability  can  be  approximately  characterized  by  the  product  so 

that,  for  a  given  desired  reliability,  the  required  energy  per  bit  (and  hence  power)  scales  inversely 
as  power  efficiency  for  a  hxed  bit  rate.  Communication  link  designers  use  such  concepts  as  the 
basis  for  forming  a  “link  budget”  that  can  be  used  to  choose  link  parameters  such  as  transmit 
power,  antenna  gains  and  range. 

Even  based  on  these  rather  sketchy  and  oversimplihed  arguments,  we  can  draw  quick  conclusions 
on  the  power-bandwidth  tradeoffs  in  using  different  constellations,  as  shown  in  the  following 
example. 


Example  4.3.2  We  wish  to  design  a  passband  communication  system  operating  at  a  bit  rate  of 
40  Mbps. 

(a)  What  is  the  bandwidth  required  if  we  employ  QPSK,  with  an  excess  bandwidth  of  25%. 

(b)  What  if  we  now  employ  16QAM,  again  with  excess  bandwidth  25%. 

(c)  Suppose  that  the  QPSK  system  in  (a)  attains  a  desired  reliability  when  the  transmit  power  is 
50  mW.  Give  an  estimate  of  the  transmit  power  needed  for  the  16QAM  system  in  (b)  to  attain 
a  similar  reliability. 

(d)  How  does  the  bandwidth  and  transmit  power  required  change  for  the  QPSK  system  if  we 
increase  the  bit  rate  to  80  Mbps. 

(e)  How  does  the  bandwidth  and  transmit  power  required  change  for  the  QPSK  system  if  we 
increase  the  bit  rate  to  80  Mbps. 

Solution:  (a)  The  bandwidth  efficiency  of  QPSK  is  2  bits/symbol,  hence  the  minimum  bandwidth 
required  is  20  MHz.  For  excess  bandwidth  of  25%,  the  bandwidth  required  is  25  MHz. 

(b)  The  bandwidth  efficiency  of  16QAM  is  4  bits/symbol,  hence,  reasoning  as  in  (a),  the  band¬ 
width  required  is  12.5  MHz. 

(c)  We  wish  to  set  rjpEb/No  to  be  equal  for  both  systems  in  order  to  keep  the  reliability  roughly 
the  same.  Assuming  that  the  noise  PSD  A'^o  is  the  same  for  both  systems,  the  required  Eb  scales 
as  I /rjp.  Since  the  bit  rates  Rb  for  both  systems  are  equal,  the  required  received  power  P  =  EbRb 
(and  hence  the  required  transmit  power,  assuming  that  received  power  scales  linearly  with  trans¬ 
mit  power)  also  scales  as  l/r/p.  We  already  know  that  rjp  =  A  for  QPSK.  It  remains  to  hnd  rjp  for 
16QAM,  which  is  shown  in  Problem  4.15  to  equal  8/5.  We  therefore  conclude  that  the  transmit 
power  for  the  16QAM  system  can  be  estimated  as 


Pt{16QAM) 


Pt{QPSK) 


ppjQPSK) 

rip{16QAM) 


which  evaluates  for  125  mW. 

(d)  For  hxed  bandwidth  efficiency,  required  bandwidth  scales  linearly  with  bit  rate,  hence  the 
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new  bandwidth  reqnired  is  50  MHz.  In  order  to  maintain  a  given  reliability,  we  must  maintain 
the  same  value  of  r]pEf,/No  as  in  (c).  The  power  efficiency  r]p  is  unchanged,  since  we  are  using 
the  same  constellation.  Assuming  that  the  noise  PSD  Nq  is  unchanged,  the  required  energy  per 
bit  Eh  is  unchanged,  hence  transmit  power  must  scale  up  linearly  with  bit  rate  Rh-  Thus,  the 
power  required  using  QPSK  is  now  100  mW. 

(e)  Arguing  as  in  (d),  we  require  a  bandwidth  of  25  MHz  and  a  power  of  250  mW  for  16QAM, 
using  the  results  in  (b)  and  (c). 


4.3.5  The  Nyquist  criterion  at  the  link  level 


Symbols 
{b[n]} 
rate  1/T 


Figure  4.14:  Nyquist  criterion  at  the  link  level. 


Figure  4.14  shows  a  block  diagram  for  a  link  using  linear  modulation,  with  the  entire  model 
expressed  in  complex  baseband.  The  symbols  {&[n]}  are  passed  through  the  transmit  hlter  to 
obtain  the  waveform  '^j^b[n]gTx{t  —  nT).  This  then  goes  through  the  channel  hlter  gc{t),  and 
then  the  receive  hlter  gpxit).  Thus,  at  the  output  of  the  receive  hlter,  we  have  the  linearly 
modulated  signal  “  '^T),  where  p(t)  =  {gpx  *  Qc  *  gRx)(t)  is  the  cascade  of  the 

transmit,  channel  and  receive  hlters.  We  would  like  the  pulse  p{t)  to  be  Nyquist  at  rate  1/T,  so 
that,  in  the  absence  of  noise,  the  symbol  rate  samples  at  the  output  of  the  receive  hlter  equal 
the  transmitted  symbols.  Of  course,  in  practice,  we  do  not  have  control  over  the  channel,  hence 
we  often  assume  an  ideal  channel,  and  design  such  that  the  cascade  of  the  transmit  and  receive 
hlter,  given  by  {gpx  *  Qrx)  it)GTx{f)GRx{f)  is  Nyquist.  One  possible  choice  is  to  set  Gpx  to 
be  a  Nyquist  pulse,  and  Grx  to  be  a  wideband  hlter  whose  response  is  hat  over  the  band  of 
interest.  Another  choice  that  is  even  more  popular  is  to  set  Gpxif)  and  Gpxif)  to  be  square 
roots  of  a  Nyquist  pulse.  In  particular,  the  square  root  raised  cosine  (SRRC)  pulse  is  often  used 
in  practice. 

A  framework  for  software  simulations  of  linear  modulated  systems  with  raised  cosine  and  SRRC 
pulses,  including  Matlab  code  fragments,  is  provided  in  the  appendix,  and  provides  a  foundation 
for  Software  Lab  4.1. 

Square  root  Nyquist  pulses  and  their  time  domain  interpretation:  A  pulse  g{t)  -H-  G{f) 
is  dehned  to  be  square  root  Nyquist  at  rate  1/T  if  |G(/)p  is  Nyquist  at  rate  1/T.  Note  that 
F(/)  =  |G(/)P  ^  p{t)  =  {g  *  gMF){t),  where  gMpit)  =  The  time  domain  Nyquist 

condition  is  given  by 


p{mT)  =  {g*  gMF){mT) 


j  git)g*it 


mT)dt  =  5^0 


(4.20) 


That  is,  a  square  root  Nyquist  pulse  has  an  autocorrelation  function  that  vanishes  at  nonzero 
integer  multiples  of  T.  In  other  words,  the  waveforms  {g{t  —  kT,  k  =  0,  ±1,  ±2, ...}  are  orthonor¬ 
mal,  and  can  be  used  to  provide  a  basis  for  constructing  more  complex  waveforms,  as  we  see  in 
Section  4.3.6. 

Food  for  thought:  True  or  False?  Any  pulse  timelimited  to  [0,T]  is  square  root  Nyquist  at 
rate  1/T. 
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4.3.6  Linear  modulation  as  a  building  block 

Linear  modulation  can  be  used  as  a  building  block  for  constructing  more  sophisticated  waveforms, 
using  discrete-time  sequences  modulated  by  square  root  Nyquist  pulses.  Thus,  one  symbol  would 
be  made  up  of  multiple  “chips,”  linearly  modulated  by  a  square  root  Nyquist  “chip  waveform.” 
Specihcally,  suppose  that  'ip(t)  is  square  root  Nyquist  at  a  chip  rate  N  chips  make  up 
one  symbol,  so  that  the  symbol  rate  is  ^  and  a  symbol  waveform  is  given  by  linearly 

modulating  a  code  vector  s  =  (s[0], ...,  s[iV  —  1])  consisting  of  N  chips,  as  follows: 

N 

s{t)  =  '^s[k]i!{t  -  kTc) 

k=0 

Since  {ipit  —  kT^.)}  are  orthonormal  (see  (4.20)),  we  have  simply  expressed  the  code  vector  in  a 
continuous  time  basis.  Thus,  the  continuous  time  inner  product  between  two  symbol  waveforms 
(which  determines  their  geometric  relationships  and  their  performance  in  noise,  as  we  see  in 
the  next  chapter)  is  equal  to  the  discrete  time  inner  product  between  the  corresponding  code 
vectors.  Specihcally,  suppose  that  si(t)  and  S2(t)  are  two  symbol  waveforms  corresponding  to 
code  vectors  Si  and  S2,  respectively.  Then  their  inner  product  satishes 


N-lN-l  „  N-l 

(si,  52)  =  ^  ^  Si[k]s*2[l]  /  ijit  -  kTc)'ip*(t  -  lTc)dt  =  ^  si[k]sl[k]  =  (si,  S2) 

k=0  1=0  k=0 

where  we  have  use  the  orthonormality  of  the  translates  {'0(t  —  kTc)}.  This  means  that  we  can 
design  discrete  time  code  vectors  to  have  certain  desired  properties,  and  then  linearly  modulate 
square  root  Nyquist  chip  waveforms  to  get  symbol  waveforms  that  have  the  same  desired  prop¬ 
erties.  For  example,  if  Si  and  S2  are  orthogonal,  then  so  are  si{t)  and  S2(t);  we  use  this  in  the 
next  section  when  we  discuss  orthogonal  modulation. 

Examples  of  square  root  Nyquist  chip  waveforms  include  a  rectangular  pulse  timelimited  to  an 
interval  of  length  Tc  ,  as  well  as  bandlimited  pulses  such  as  the  square  root  raised  cosine.  From 
Theorem  4.2.1,  we  see  that  the  PSD  of  the  modulated  waveform  is  proportional  to  |\k(/)p  (it  is 
typically  a  good  approximation  to  assume  that  the  chips  {s[/c]}  are  uncorrelated).  That  is,  the 
bandwidth  occupancy  is  determined  by  that  of  the  chip  waveform  ip. 


4.4  Orthogonal  and  Biorthogonal  Modulation 

While  linear  modulation  with  larger  and  larger  constellations  is  a  means  of  increasing  bandwidth 
efficiency,  we  shall  see  that  orthogonal  modulation  with  larger  and  larger  constellations  is  a 
means  of  increasing  power  efficiency  (at  the  cost  of  making  the  bandwidth  efficiency  smaller). 
Consider  hrst  M-ary  frequency  shift  keying  (FSK),  a  classical  form  of  orthogonal  modulation  in 
which  one  of  M  sinusoidal  tones,  successively  spaced  by  A/,  are  transmitted  every  T  units  of 
time,  where  ^  is  the  symbol  rate.  Thus,  the  bit  rate  is  ,  and  for  a  typical  symbol  interval, 
the  transmitted  passband  signal  is  chosen  from  one  of  M  possibilities: 

Up,k{t)  =  COS  (27r(/o  -|-  kAf)t)  ,  0  <  t  <  T,  /c  =  0, 1, ...,  M  —  1 

where  we  typically  have  /o  S>  Taking  /o  as  reference,  the  corresponding  complex  baseband 
waveforms  are 

Mfc(t)  =  exp  (j27r/cA/f)  ,  0  <  t  <  T,  /c  =  0, 1, ...,  M  —  1 
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Let  us  now  understand  how  the  tones  should  be  chosen  in  order  to  ensure  orthogonality.  Recall 
that  the  passband  and  complex  baseband  inner  products  are  related  as  follows: 

{up,k,Up,i)  =  ^Re{uk,ui) 

so  we  can  develop  criteria  for  orthogonality  working  in  complex  baseband.  Setting  k  =  I,  we  see 
that 

\W\\^  =  T 

For  two  adjacent  tones,  I  =  k  +  1,  we  leave  it  as  an  exercise  to  show  that 

sin  271  AfT 
2vrA/ 

We  see  that  the  minimum  value  of  A/  for  which  the  preceding  quantity  is  zero  is  given  by 
27rA/T  =  TT,  or  A/  = 

Thus,  from  the  point  of  view  of  the  receiver,  a  tone  spacing  of  ^  ensures  that  when  there  is  an 
incoming  wave  at  the  kth  tone,  then  correlating  against  the  kth  tone  will  give  a  large  output,  but 
correlating  against  the  {k  +  l)th  tone  will  give  zero  output  (in  the  absence  of  noise).  However, 
this  assumes  a  coherent  system  in  which  the  tones  we  are  correlating  against  are  synchronized  in 
phase  with  the  incoming  wave.  What  happens  if  they  are  90°  out  of  phase?  Then  correlation  of 
the  /cth  tone  with  itself  yields 

cos  (27r(/o  +  kAf)t)  cos  ^27r(/o  +  kAf)t  +  —  j  dt  =  0 

(by  orthogonality  of  the  cosine  and  sine),  so  that  the  output  we  desire  to  be  large  is  actually 
zero!  Robustness  to  such  variations  can  be  obtained  by  employing  noncoherent  reception,  which 
we  describe  next. 

Noncoherent  reception:  Let  us  develop  the  concept  of  noncoherent  reception  in  generality, 
because  it  is  a  concept  that  is  useful  in  many  settings,  not  just  for  orthogonal  modulation.  Sup¬ 
pose  that  we  transmit  a  passband  waveform,  and  wish  to  detect  it  at  the  receiver  by  correlating 
it  against  the  receiver’s  copy  of  the  waveform.  However,  the  receiver’s  local  oscillator  may  not 
be  synchronized  in  phase  with  the  phase  of  the  incoming  wave.  Let  us  denote  the  receiver’s  copy 
of  the  signal  as 

Up(t)  =  Uc{t)  cos  271  fj.  —  Usit)  sin  27ifct 
and  the  incoming  passband  signal  as 


Vpit)  =  Veit)  cos  271  fet  -  Usit)  sin  27ifct  =  Uc{t)  cos  {27i f^t  +  6)  -  Usit)  sin  {271  f^t  +  9) 

Using  the  receiver’s  local  oscillator  as  reference,  the  complex  envelope  of  the  receiver’s  copy  is 
u(t)  =  Uc  +  jugit),  while  that  of  the  incoming  wave  is  y(t)  =  u(t)e^^.  Thus,  the  inner  product 

1  1  1  I  I  I  I  ^ 

{yp,Up)  =  ^R(i{y,u)  =  -Re{ue^^,u)  =  -Re  (||m||V'^)  =  -^cos9 

Thus,  the  output  of  the  correlator  is  degraded  by  the  factor  cos  9,  and  can  actually  become  zero, 
as  we  have  already  observed,  if  the  phase  offset  9  =  77 12.  In  order  to  get  around  this  problem, 

let  us  look  at  the  complex  baseband  inner  product  again: 
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We  could  ensure  that  this  output  remains  large  regardless  of  the  value  of  6  if  we  took  its  magni¬ 
tude,  rather  than  the  real  part.  Thus,  noncoherent  reception  corresponds  to  computing  \{y,u)\ 
or  \{y,u)\'^.  Let  us  unwrap  the  complex  inner  product  to  see  what  this  entails: 


(y.u)  =  I vitwm  =  j -  (y..u.)) 

Thus,  the  noncoherent  receiver  computes  the  quantity 

\{y,u)\‘^  =  {{yc,Uc)  +  {ys,Us)f  +  {{ys,Uc)  -  {yc,Us)f 
In  contrast,  the  coherent  receiver  computes 

Re{y,u)  =  {yc,Uc)  +  {ys,Us) 

That  is,  when  the  receiver  LO  is  synchronized  to  the  phase  of  the  incoming  wave,  we  can  correlate 
the  I  component  of  the  received  waveform  with  the  I  component  of  the  receiver’s  copy,  and 
similarly  correlate  the  Q  components,  and  sum  them  up.  However,  in  the  presence  of  phase 
asynchrony,  the  I  and  Q  components  get  mixed  up,  and  we  must  compute  the  magnitude  of  the 
complex  inner  product  to  recover  all  the  energy  of  the  incoming  wave.  Figure  4.15  shows  the 
receiver  operations  corresponding  to  coherent  and  noncoherent  reception. 


Figure  4.15:  Structure  of  coherent  and  noncoherent  receivers. 


Back  to  FSK:  Going  back  to  FSK,  if  we  now  use  noncoherent  reception,  then  in  order  to 
ensure  that  we  get  a  zero  output  (in  the  absence  of  noise)  when  receiving  the  kth  tone  with  a 
noncoherent  receiver  for  the  {k  +  l)th  tone,  we  must  ensure  that 

\{uk,Uk+i)\  =  0 

We  leave  it  as  an  exercise  (Problem  4.18)  to  show  that  the  minimum  tone  spacing  for  noncoherent 
FSK  is  which  is  double  that  required  for  orthogonality  in  coherent  FSK.  The  bandwidth  for 
coherent  M-ary  FSK  is  approximately  which  corresponds  to  a  time-bandwidth  product  of 
approximately  This  corresponds  to  a  complex  vector  space  of  dimension  or  a  real  vector 
space  of  dimension  M,  in  which  we  can  ht  M  orthogonal  signals.  On  the  other  hand,  M-ary 
noncoherent  signaling  requires  M  complex  dimensions,  since  the  complex  baseband  signals  must 
remain  orthogonal  even  under  multiplication  by  complex-valued  scalars. 

Summarizing  the  concept  of  orthogonality:  To  summarize,  when  we  say  “orthogonal” 
modulation,  we  must  specify  whether  we  mean  coherent  or  noncoherent  reception,  because  the 
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concept  of  orthogonality  is  different  in  the  two  cases.  For  a  signal  set  {sfc(f)},  orthogonality 
requires  that,  for  A;  7^  /,  we  have 


Re((sfc,s;))  =  0  coherent  orthogonality  criterion 
{sk,si)  =  0  noncoherent  orthogonality  criterion 


(4.21) 


Bandwidth  efficiency:  We  conclude  from  the  example  of  orthogonal  FSK  that  the  bandwidth 
efficiency  of  orthogonal  signaling  is  bits/complex  dimension  for  coherent  systems, 

and  rjB  =  bits/complex  dimension  for  noncoherent  systems.  This  is  a  general  observation 

that  holds  for  any  realization  of  orthogonal  signaling.  In  a  signal  space  of  complex  dimension 
D  (and  hence  real  dimension  2D),  we  can  fit  2D  signals  satisfying  the  coherent  orthogonality 
criterion,  but  only  D  signals  satisfying  the  noncoherent  orthogonality  criterion.  As  M  gets  large, 
the  bandwidth  efficiency  tends  to  zero.  In  compensation,  as  we  see  in  Chapter  6,  the  power 
efficiency  of  orthogonal  signaling  for  large  M  is  the  “best  possible.” 

Orthogonal  Walsh-Hadamard  codes 

Section  4.3.6  shows  how  to  map  vectors  to  waveforms  while  preserving  inner  products,  by  using 
linear  modulation  with  a  square  root  Nyquist  chip  waveform.  Applying  this  construction,  the 
problem  of  designing  orthogonal  waveforms  {sj}  now  reduces  to  designing  orthogonal  code  vectors 
{sj}.  Walsh-Hadamard  codes  are  a  standard  construction  employed  for  this  purpose,  and  can 
be  constructed  recursively  as  follows:  at  the  nth  stage,  we  generate  2”  orthogonal  vectors,  using 
the  2"“^  vectors  constructed  in  the  n  —  1  stage.  Let  denote  a  matrix  whose  rows  are  2”' 
orthogonal  codes  obtained  after  the  nth  stage,  with  Hq  =  (1).  Then 

TT  _  I  ^^n—1  \ 

J 

We  therefore  get 


/I  1  1  1  \ 


VI  -1  -1  1  / 


Figure  4.16  depicts  the  waveforms  corresponding  to  the  4-ary  signal  set  in  H2  using  a  rectangular 
timelimited  chip  waveform  to  go  from  sequences  to  signals,  as  described  in  Section  4.3.6. 

The  signals  {sj}  obtained  above  can  be  used  for  noncoherent  orthogonal  signaling,  since  they 
satisfy  the  orthogonality  criterion  {si,Sj)  =  0  for  z  7^  j.  However,  just  as  for  FSK,  we  can 
fit  twice  as  many  signals  into  the  same  number  of  degrees  of  freedom  if  we  used  the  weaker 
notion  of  orthogonality  required  for  coherent  signaling,  namely  Re((sj,Sj)  =  0  for  i  7^  j.  It 
is  easy  to  check  that  for  M-ary  Walsh-Hadamard  signals  {si,i  =  1,...,M},  we  can  get  2M 
orthogonal  signals  for  coherent  signaling:  {si,jsi,i  =  1,...,M}.  This  construction  corresponds 
to  independently  modulating  the  1  and  Q  components  with  a  Walsh-Hadamard  code;  that  is, 
using  passband  waveforms  Si{t)  cos 271  fj,  and  —Si{t)  sin  271  fct  (the  negative  sign  is  only  to  conform 
to  our  convention  for  1  and  Q,  and  can  be  dropped,  which  corresponds  to  replacing  jsi  by  —jsi 
in  complex  baseband),  i  =  1, ...,  M. 

Biorthogonal  modulation 

Given  an  orthogonal  signal  set,  a  biorthogonal  signal  set  of  twice  the  size  can  be  obtained  by 
including  a  negated  copy  of  each  signal.  Since  signals  s  and  —s  cannot  be  distinguished  in  a 
noncoherent  system,  biorthogonal  signaling  is  applicable  to  coherent  systems.  Thus,  for  an  M-ary 
Walsh-Hadamard  signal  set  {sj}  with  M  signals  obeying  the  noncoherent  orthogonality  criterion, 
we  can  construct  a  coherent  orthogonal  signal  set  {sj,  js*}  of  size  2M,  and  hence  a  biorthogonal 
signal  set  of  size  4M,  e.g.,  {s,,  js,,  — s,,  — js,}.  These  correspond  to  the  4M  passband  waveforms 
±Sj(f)  cos27i  fct  and  ±Si{t)  sm27i  fct,  i  = 
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Figure  4.16:  Walsh-Hadamard  codes  for  4-ary  orthogonal  modulation. 


4.5  Proofs  of  the  Nyquist  theorems 

We  have  used  Nyquist’s  sampling  theorem,  Theorem  4.3.1,  to  argue  that  linear  modulation 
using  the  sine  pulse  is  able  to  use  all  the  degrees  of  freedom  in  a  bandlimited  channel.  On 
the  other  hand,  Nyquist’s  criterion  for  ISI  avoidance.  Theorem  4.3.2,  tells  us,  roughly  speaking, 
that  we  must  have  enough  degrees  of  freedom  in  order  to  avoid  ISI  (and  that  the  sine  pulse 
provides  the  minimum  such  degrees  of  freedom).  As  it  turns  out,  both  theorems  are  based  on 
the  same  mathematical  relationship  between  samples  in  the  time  domain  and  aliased  spectra  in 
the  frequency  domain,  stated  in  the  following  theorem. 

Theorem  4.5.1  (Sampling  and  Aliasing):  Consider  a  signal  s{t),  sampled  at  rate  A.  j^et 
S{f)  denote  the  spectrum  of  s(t),  and  let 


1  u 

B(/)  =  ~  E  +  (4-22) 

S  J  ^  s 

k=—oo 

denote  the  sum  of  translates  of  the  spectrum.  Then  the  following  observations  hold: 

(a)  B{f)  is  periodic  with  period  A, 

(b)  The  samples  {s(uTs)}  are  the  Fourier  series  for  B{f) ,  satisfying 

s{nTs)  =  tJ  df  (4.23) 

CO 

B{f)  =  (4.24) 


Remark:  Note  that  the  signs  of  the  exponents  for  the  frequency  domain  Fourier  series  in  the 
theorem  are  reversed  from  the  convention  in  the  usual  time  domain  Fourier  series  (analogous  to 
the  reversal  of  the  sign  of  the  exponent  for  the  inverse  Fourier  transform  compared  to  the  Fourier 
transform) . 
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Proof  of  Theorem  4.5.1:  The  periodicity  of  B{f)  follows  by  its  very  construction.  To  prove 
(b),  apply  the  the  inverse  Fourier  transform  to  obtain 

/CX) 

Sif)eB-fnndf 

■oo 


We  now  write  the  integral  as  an  inhnite  sum  of  integrals  over  segments  of  length  1/T 


/c  =  — OO  ' 


In  the  integral  over  the  kth  segment,  make  the  substitution  u  =  f  —  and  rewrite  it  as 

S{u  +  S{iy  + 

Now  that  the  limits  of  all  segments  and  the  complex  exponential  in  the  integrand  are  the  same 
(i.e.,  independent  of  k),  we  can  move  the  summation  inside  to  obtain 

s(nT.)  =  /_*  (Er.-=o  Si-'  +  i)) 

=  Ts 

2Ts 

proving  (4.23).  We  can  now  recognize  that  this  is  just  the  formula  for  the  Fourier  series  coefficients 
of  B{f),  from  which  (4.24)  follows.  □ 


l/T. 


S(f+1/T,) 


S(f-1/Ts  )  S(f+1/Ts) 


s(P  V 


1/T. 


S(f-1/T.  ) 


S(P 


w 


w 


Sampling  rate  not  high  enough 
to  recover  S(P  from  B(P 


Sampling  rate  high  enough 
to  recover  S(P  from  B(P 


Figure  4.17:  Recovering  a  signal  from  its  samples  requires  a  high  enough  sampling  rate  for 
translates  of  the  spectrum  not  to  overlap. 


Inferring  Nyquist’s  sampling  theorem  from  Theorem  4.5.1:  Suppose  that  s{t)  is  ban- 
dlimited  to  ^].  The  samples  of  s(t)  at  rate  ^  can  be  used  to  reconstruct  B{f),  since  they 
are  the  Fourier  series  for  B{f).  But  S{f)  can  be  recovered  from  B{f)  if  and  only  if  the  translates 
S'(/  —  ^)  do  not  overlap,  as  shown  in  Figure  4.17.  This  happens  if  and  only  if  ^  >  W.  Once 
this  condition  is  satisfied,  ^S'(/)  can  be  recovered  from  B{f)  by  passing  it  through  an  ideal 
bandlimited  filter  H{f)  =  I[-w/2.w/2]if)-  We  therefore  obtain  that 

OO 

-Sif)  =  Bif)Hif)=  (4.25) 

^  n=—oo 

Noting  that  I[-w/2.w/2]{f)  •H-  hFsinc(lFt),  we  have 

e-^^^^^^^I[-w/2.w/2]{f)  O  Wsinc  {W{t  -  nT,)) 
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Taking  inverse  Fourier  transforms,  we  get  the  interpolation  formula 


^s{t)=  s{nTs)Wsmc{W{t-nTs)) 

S 

n=—oo 


which  reduces  to  (4.15)  for  ^  =  W.  This  completes  the  proof  of  the  sampling  theorem,  Theorem 
4.3.1.  “  □ 


Inferring  Nyqnist’s  criterion  for  ISI  avoidance  from  Theorem  4.5.1:  A  Nyquist  pulse 
pit)  at  rate  1/T  must  satisfy  p(nT)  =  6no-  Applying  Theorem  4.5.1  with  s(t)  =  pit)  and  Tg  =  T, 
it  follows  immediately  from  (4.24)  that  p(nT)  =  6no  (i.e.,  the  time  domain  Nyquist  criterion 
holds)  if  and  only  if 


1  u 

B(f)  =  -  nf  +  ~) 

u _ _  ^ 


1 


In  other  words,  if  the  Fourier  series  only  has  a  DC  term,  then  the  periodic  waveform  it  corresponds 
to  must  be  constant.  □ 


4.6  Concept  Inventory 

This  chapter  provides  an  introduction  to  how  bits  can  be  translated  to  information-carrying 
signals  which  satisfy  certain  constraints  (e.g.,  htting  within  a  given  frequency  band).  We  focus 
on  linear  modulation  over  passband  channels. 

Modulation  basics 

•  Information  bits  can  be  encoded  into  two-dimensional  (complex-valued)  constellations,  which 
can  be  modulated  onto  baseband  pulses  to  produce  a  complex  baseband  waveform.  Constella¬ 
tions  may  carry  information  in  both  amplitude  and  phase  (e.g.,  QAM)  or  in  phase  only  (e.g., 
PSK).  This  modulated  waveform  can  then  be  upconverted  to  the  appropriate  frequency  band  for 
passband  signaling. 

•  The  PSD  of  a  linearly  modulated  waveform  using  pulse  pit)  is  proportional  to  |P(/)p,  so  that 
the  choice  of  modulating  pulse  is  critical  for  determining  bandwidth  occupancy.  Fractional  power 
containment  provides  a  useful  notion  of  bandwidth. 

•  Time  limited  pulses  with  sharp  edges  have  large  bandwidth,  but  this  can  be  reduced  by  smooth¬ 
ing  out  the  edges  (e.g.,  by  replacing  a  rectangular  pulse  with  a  trapezoidal  pulse  or  by  a  sinusoidal 
pulse) . 

Degrees  of  freedom 

•  Nyquist’s  sampling  theorem  says  that  a  signal  bandlimited  over  [— hF/2,  hF/2]  is  completely 
characterized  by  its  samples  at  rate  W  (or  higher).  Applying  this  to  the  complex  envelope  of  a 
passband  signal  of  bandwidth  W ,  we  infer  that  a  passband  channel  of  bandwidth  W  provides  W 
complex-valued  degrees  of  freedom  per  unit  time  for  carrying  information. 

•  The  (time  domain)  sine  pulse,  which  corresponds  to  a  frequency  domain  boxcar,  allows  us  to 
utilize  all  degrees  of  freedom  in  a  bandlimited  channel,  but  it  decays  too  slowly,  at  rate  1/t,  for 
practical  use:  it  can  lead  to  unbounded  signal  amplitude  and,  in  the  presence  of  timing  mismatch, 
unbounded  ISI. 

ISI  avoidance 

•  The  Nyquist  criterion  for  ISI  avoidance  requires  that  the  end-to-end  signaling  pulse  vanish 
at  nonzero  integer  multiples  of  the  symbol  time.  In  the  frequency  domain,  this  corresponds  to 
aliased  versions  of  the  pulse  summing  to  a  constant. 

•  The  sine  pulse  is  the  minimum  bandwidth  Nyquist  pulse,  but  decays  too  slowly  with  time.  It 
can  be  replaced,  at  the  expense  of  some  excess  bandwidth,  by  pulses  with  less  sharp  transitions 
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in  the  frequency  domain  to  obtain  faster  decay  in  time.  The  raised  cosine  pulse  is  a  popular 
choice,  giving  a  1/f^  decay. 

•  If  the  receive  hlter  is  matched  to  the  transmit  hlter,  each  has  to  be  a  square  root  Nyquist  pulse, 
with  their  cascade  being  Nyquist.  The  SRRC  is  a  popular  choice. 

Power-bandwidth  tradeoffs 

•  For  an  M-ary  constellation,  the  bandwidth  efficiency  is  log2  M  bits  per  symbol,  so  that  larger 
constellations  are  more  bandwidth-efficient. 

•  The  power  efficiency  for  a  constellation  is  well  characterized  by  the  scale-invariant  quantity 

Large  constellations  are  typically  less  power-efficient. 

Beyond  linear  modulation 

•  Linear  modulation  using  square  root  Nyquist  pulses  can  be  used  to  translate  signal  design  from 
discrete  time  to  continuous  time  while  preserving  geometric  relationships  such  as  inner  products. 
This  is  because,  if  'ip(t)  is  square  root  Nyquist  at  rate  1/Tc,  then  {'ip(t  —  kTc)},  its  translates  by 
integer  multiples  of  Tc,  form  an  orthonormal  basis. 

•  Orthogonal  modulation  can  be  used  with  either  coherent  or  noncoherent  reception,  but  the 
concept  of  orthogonality  is  more  stringent  (eating  up  more  degrees  of  freedom)  for  noncoherent 
orthogonal  signaling.  Waveforms  for  orthogonal  modulation  can  be  constructed  in  a  variety 
of  ways,  including  FSK  and  Walsh-Hadamard  sequences  modulated  onto  square  root  Nyquist 
pulses.  Biorthogonal  signaling  doubles  the  signaling  alphabet  for  coherent  orthogonal  signaling 
by  adding  the  negative  of  each  signal  to  the  constellation. 

Sampling  and  aliasing 

•  Time  domain  sampling  corresponds  to  frequency  domain  aliasing.  Specihcally,  the  samples  of 

a  waveform  x{t)  at  rate  1/T  are  the  Fourier  series  for  the  periodic  frequency  domain  waveform 
L  ~  ^/'^)  obtained  by  summing  the  frequency  domain  waveform  and  its  aliases  X{f  — 

k/T)  {k  integer). 

•  The  Nyquist  sampling  theorem  corresponds  to  requiring  that  the  aliased  copies  are  far  enough 
apart  (i.e.,  the  sampling  rate  is  high  enough)  that  we  can  recover  the  original  frequency  domain 
waveform  by  hltering  the  sum  of  the  aliased  waveforms. 

•  The  Nyquist  criterion  for  interference  avoidance  requires  that  the  samples  of  the  signaling 
pulse  form  a  discrete  delta  function,  or  that  the  corresponding  sum  of  the  aliased  waveforms  is 
a  constant. 


4.7  Endnotes 

While  we  use  linear  modulation  in  the  time  domain  for  our  introduction  to  modulation,  an 
alternative  frequency  domain  approach  is  to  divide  the  available  bandwidth  into  thin  slices,  or 
subcarriers,  and  to  transmit  symbols  in  parallel  on  each  subcarrier.  Such  a  strategy  is  termed 
Orthogonal  Frequency  Division  Multiplexing  (OFDM)  or  multicarrier  modulation,  and  we  discuss 
it  in  more  detail  in  Chapter  7.  OFDM  is  also  termed  multicarrier  modulation,  while  the  time 
domain  linear  modulation  schemes  covered  here  are  classihed  as  singlecarrier  modulation.  In 
addition  to  the  degrees  of  freedom  provided  by  time  and  frequency,  additional  spatial  degrees  of 
freedom  can  be  obtained  by  employing  multiple  antennas  at  the  transmitter  and  receiver,  and 
we  provide  a  glimpse  of  such  Multiple  Input  Multiple  Output  (MIMO)  techniques  in  Chapter  8. 

While  the  basic  linear  modulation  strategies  discussed  here,  in  either  singlecarrier  or  multicarrier 
modulation  formats,  are  employed  in  many  existing  and  emerging  communication  systems,  it  is 
worth  mentioning  a  number  of  other  strategies  in  which  modulation  with  memory  is  used  to  shape 
the  transmitted  waveform  in  various  ways,  including  insertion  of  spectral  nulls  (e.g.,  line  codes, 
often  used  for  baseband  wireline  transmission),  avoidance  of  long  runs  of  zeros  and  ones  which 
can  disrupt  synchronization  (e.g.,  runlength  constrained  codes,  often  used  for  magnetic  recording 
channels),  controlling  variations  in  the  signal  envelope  (e.g.,  constant  phase  modulation),  and 
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controlling  ISI  (e.g.,  partial  response  signaling).  Memory  can  also  be  inserted  in  the  manner 
that  bits  are  encoded  into  symbols  (e.g.,  differential  encoding  for  alleviating  the  need  to  track 
a  time- varying  channel),  without  changing  the  basic  linear  modulation  format.  The  preceding 
discussion,  while  not  containing  enough  detail  to  convey  the  underlying  concepts,  is  meant  to 
provide  keywords  to  facilitate  further  exploration,  with  more  advanced  communication  theory 
texts  such  as  [5,  7,  8]  serving  as  a  good  starting  point. 


Problems 


Timelimited  pulses 

Problem  4.1  (Sine  pulse)  Consider  the  sine  pulse  pulse  p{t)  =  sin7rt/[o,i](t). 

(a)  Show  that  its  Fourier  transform  is  given  by 

^  2  cos(7r/) 

7r(l-4/2) 

(b)  Consider  the  linearly  modulated  signal  u{t)  =  b['^]pit  —  n),  where  b[n]  are  independently 
chosen  to  take  values  in  a  QPSK  constellation  (each  point  chosen  with  equal  probability),  and 
the  unit  of  time  is  in  microseconds.  Find  the  95%  power  containment  bandwidth  (specify  the 
units). 

Problem  4.2  Consider  the  pulse 

0  <t  <  a, 
a  <  t  <  1  —  a, 

1  -  a  <  f  <  1, 

else. 


p{t)  = 


where  0  <  a  <  2. 

(a)  Sketch  p{t)  and  find  its  Fourier  transform  P{f). 

(b)  Consider  the  linearly  modulated  signal  u(t)  =  where  b[n\  take  values  inde¬ 

pendently  and  with  equal  probability  in  a  4-PAM  alphabet  {±1,  ±3}.  Find  an  expression  for  the 
PSD  of  M  as  a  function  of  the  pulse  shape  parameter  a. 

(c)  Numerically  estimate  the  95%  fractional  power  containment  bandwidth  for  u  and  plot  it  as  a 
function  of  0  <a<i.  For  concreteness,  assume  the  unit  of  time  is  100  picoseconds  and  specify 
the  units  of  bandwidth  in  your  plot. 


Basic  concepts  in  Nyquist  signaling 

Problem  4.3  Consider  a  pulse  s{t)  =  sine  (at)  sine  (6f),  where  a  >  b. 

(a)  Sketch  the  frequency  domain  response  S{f)  of  the  pulse. 

(b)  Suppose  that  the  pulse  is  to  be  used  over  an  ideal  real  baseband  channel  with  one-sided 
bandwidth  400  Hz.  Choose  a  and  b  so  that  the  pulse  is  Nyquist  for  4-PAM  signaling  at  1200 
bits/sec  and  exactly  fills  the  channel  bandwidth. 

(c)  Now,  suppose  that  the  pulse  is  to  be  used  over  a  passband  channel  spanning  the  frequencies 
2.4-2.42  GHz.  Assuming  that  we  use  64-QAM  signaling  at  60  Mbits/sec,  choose  a  and  b  so  that 
the  pulse  is  Nyquist  and  exactly  fills  the  channel  bandwidth. 

(d)  Sketch  an  argument  showing  that  the  magnitude  of  the  transmitted  waveform  in  the  preceding 
settings  is  always  finite. 
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Problem  4.4  Consider  the  pulse  p{t)  whose  Fourier  transform  satishes: 


'1.  0<|/|<A 

P(/)=|  S?.  A<\f\<B 

0,  else 


where  A  =  250KHz  and  B  =  1.25MHz. 

(a)  True  or  False  The  pulse  p(t)  can  be  used  for  Nyquist  signaling  at  rate  3  Mbps  using  an 
8-PSK  constellation. 

(b)  True  or  False  The  pulse  p{t)  can  be  used  for  Nyquist  signaling  at  rate  4.5  Mbps  using  an 
8-PSK  constellation. 


Problem  4.5  Consider  the  pulse 

f  1-  0  <  |t|  <T 

pit)  =  < 

0,  else 

Let  P{f)  denote  the  Fourier  transform  of  p{t). 

(a)  True  or  False  The  pulse  p(t)  is  Nyquist  at  rate 

(b)  True  or  False  The  pulse  p{t)  is  square  root  Nyquist  at  rate  (i.e.,  |P(/)p  is  Nyquist  at 
rate  ^). 


P(f) 


Problem  4.6  Consider  Nyquist  signaling  at  80  Mbps  using  a  16QAM  constellation  with  50% 
excess  bandwidth.  The  signaling  pulse  has  spectrum  shown  in  Figure  4.18. 

(a)  Find  the  values  of  a  and  b  in  the  hgure,  making  sure  you  specify  the  units. 

(b)  True  or  False  The  pulse  is  also  Nyquist  for  signaling  at  20  Mbps  using  QPSK.  (Justify  your 
answer.) 

Problem  4.7  Consider  linear  modulation  with  a  signaling  pulse  p{t)  =  sinc(at)sinc(5f),  where 
a  and  b  are  to  be  determined. 

(a)  How  should  a  and  b  be  chosen  so  that  p(t)  is  Nyquist  with  50%  excess  bandwidth  for  a  data 
rate  of  40  Mbps  using  16QAM?  Specify  the  occupied  bandwidth. 

(b)  How  should  a  and  b  be  chosen  so  that  p(t)  can  be  used  for  Nyquist  signaling  both  for  a 
16QAM  system  with  40  Mbps  data  rate,  and  for  an  8PSK  system  with  18  Mbps  data  rate? 
Specify  the  occupied  bandwidth. 
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Problem  4.8  Consider  a  passband  communication  link  operating  at  a  bit  rate  of  16  Mbps  using 
a  256-QAM  constellation. 

(a)  What  must  we  set  the  unit  of  time  as  so  that  p{t)  =  sin  7rtJ[o4](f)  is  square  root  Nyquist  for 
the  system  of  interest,  while  occupying  the  smallest  possible  bandwidth? 

(b)  What  must  we  set  the  unit  of  time  as  so  that  p(t)  =  sinc(t)sinc(2f)  is  Nyquist  for  the  system 
of  interest,  while  occupying  the  smallest  possible  bandwidth? 


Problem  4.9  Consider  passband  linear  modulation  with  a  pulse  of  the  formp(f)  =  sinc(3f)sinc(2f), 
where  the  unit  of  time  is  microseconds. 

(a)  Sketch  the  spectrum  P(/)  versus  /.  Make  sure  you  specify  the  units  on  the  /  axis. 

(b)  What  is  the  largest  achievable  hit  rate  for  Nyquist  signaling  using  p{t)  if  we  employ  a  16QAM 
constellation?  What  is  the  fractional  excess  bandwidth  for  this  bit  rate? 

(c)  (True  or  False)  The  pulse  p{t)  can  be  used  for  Nyquist  signaling  at  a  bit  rate  of  4  Mbps 
using  a  QPSK  constellation. 

Problem  4.10  (True  or  False)  Any  pulse  timelimited  to  duration  T  is  square  root  Nyquist 
(up  to  scaling)  at  rate  1/T. 


Problem  4.11  (Raised  cosine  pulse)  In  this  problem,  we  derive  the  time  domain  response  of 
the  frequency  domain  raised  cosine  pulse.  Let  R{f)  =  /[_i  !](/)  denote  an  ideal  boxcar  transfer 
function,  and  let  C(/)  =  ^  cos(^/)/[_|^|]  denote  a  cosine  transfer  function. 

(a)  Sketch  R{f)  and  C{f),  assuming  that  0  <  a  <  1. 

(b)  Show  that  the  frequency  domain  raised  cosine  pulse  can  be  written  as 

S{f)  =  {R*C){f) 

(c)  Find  the  time  domain  pulse  s{t)  =  r{t)c{t).  Where  are  the  zeros  of  s{t)7  Conclude  that 
s{t/T)  is  Nyquist  at  rate  1/T. 

(d)  Sketch  an  argument  that  shows  that,  if  the  pulse  s(t/T)  is  used  for  BPSK  signaling  at  rate 
1/T,  then  the  magnitude  of  the  transmitted  waveform  is  always  finite. 


Software  experiments  with  Nyquist  and  square  root  Nyquist  pulses 

Problem  4.12  (Software  exercise  for  the  raised  cosine  pulse)  Code  fragment  4.B.1  in  the 

appendix  implements  a  discrete  time  truncated  raised  cosine  pulse. 

(a)  Run  the  code  fragment  for  25%,  50%  and  100%  excess  bandwidths  and  plot  the  time  domain 
waveforms  versus  normalized  time  t/T  over  the  interval  [— 5T,  5T],  sampling  fast  enough  (e.g., 
at  rate  32/T  or  higher)  to  obtain  smooth  curves.  Comment  on  the  effect  of  varying  the  excess 
bandwidth  on  these  waveforms. 

(b)  For  excess  bandwidth  of  50%,  numerically  explore  the  effect  of  time  domain  truncation  on 
frequency  domain  spillage.  Specifically,  compute  the  Fourier  transform  for  two  cases:  truncation 
to  [— 2T,  2T]  and  truncation  to  [— 5T,  5T],  using  the  DFT  as  described  in  code  fragment  2.5.1  to 
obtain  a  frequency  resolution  at  least  as  good  as  Plot  these  Fourier  transforms  against  the 
normalized  frequency  /T,  and  comment  on  how  much  of  increase  in  bandwidth,  if  any,  you  see 
due  to  truncation  in  the  two  cases. 

(c)  Numerically  compute  the  95%  bandwidth  of  the  two  pulses  in  (b),  and  compare  it  with  the 
nominal  bandwidth  without  truncation. 

Problem  4.13  (Software  exercise  for  the  SRRC  pulse)  (a)  Write  a  function  for  generating 
a  sampled  SRRC  pulse,  analogous  to  code  fragment  4.B.1,  where  you  can  specify  the  sampling 
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rate,  the  excess  bandwidth,  and  the  truncation  length.  The  time  domain  expression  for  the 
SRRC  pulse  is  given  by  (4.45)  in  the  appendix. 

Remark:  The  zero  in  the  denominator  can  be  handled  either  by  analytical  or  numerical  imple¬ 
mentation  of  L’Hospital’s  rule.  See  comments  in  code  fragment  4.B.I. 

(b)  Plot  the  SRRC  pulses  versus  normalized  time  t/T,  for  excess  bandwidths  of  25%,  50%  and 
100%.  Comment  on  the  effect  of  varying  excess  bandwidth  on  these  waveforms. 

(c)  in  the  appendix  implements  a  discrete  time  truncated  raised  cosine  pulse. 

(a)  Run  the  code  fragment  for  25%,  50%  and  100%  excess  bandwidths  and  plot  the  time  domain 
waveforms  over  [— 5T,  5T],  sampling  fast  enough  (e.g.,  at  rate  32/T  or  higher)  to  obtain  smooth 
curves.  Comment  on  the  effect  of  varying  the  excess  bandwidth  on  these  waveforms. 

(b)  For  excess  bandwidth  of  50%,  numerically  explore  the  effect  of  time  domain  truncation  on 
frequency  domain  spillage.  Specihcally,  compute  the  Fourier  transform  for  two  cases:  truncation 
to  [— 2T,  2T]  and  truncation  to  [— 5T,  5T],  using  the  DFT  as  described  in  code  fragment  2.5.1  to 
obtain  a  frequency  resolution  at  least  as  good  as  Plot  these  Fourier  transforms  against  the 
normalized  frequency  fT,  and  comment  on  how  much  of  increase  in  bandwidth,  if  any,  you  see 
due  to  truncation  in  the  two  cases. 

(c)  Numerically  compute  the  95%  bandwidth  of  the  two  pulses  in  (b),  and  compare  it  with  the 
nominal  bandwidth  without  truncation. 


Effect  of  timing  errors 


Problem  4.14  (Effect  of  timing  errors)  Consider  digital  modulation  at  rate  1/T  using  the 
sine  pulse  s{t)  =  sinc(2hFt),  with  transmitted  waveform 

100 

y{t)  =  ^  br,s{t  -{n-  1)T) 

n=l 

where  1/T  is  the  symbol  rate  and  {&«}  is  the  bit  stream  being  sent  (assume  that  each  bn  takes 
one  of  the  values  ±1  with  equal  probability).  The  receiver  makes  bit  decisions  based  on  the 
samples  =  y{{n  —  1)T),  n  =  1, ...,  100. 

(a)  For  what  value  of  T  (as  a  function  of  W)  is  =  bn,  n  =  1, ...,  100? 

Remark:  In  this  case,  we  simply  use  the  sign  of  the  nth  sample  as  an  estimate  of 

(b)  For  the  choice  of  T  as  in  (a),  suppose  that  the  receiver  sampling  times  are  off  by  .25  T.  That 
is,  the  nth  sample  is  given  by  =  y{{n  —  1)T -|-  .25T),  n  =  1, ...,  100.  In  this  case,  we  do  have  ISI 
of  different  degrees  of  severity,  depending  on  the  bit  pattern.  Consider  the  following  bit  pattern; 


f  (-1)”"^  1  <  n  <  49 
\  (-1)^  50  <  n  <  100 

Numerically  evaluate  the  50th  sample  r^Q.  Does  it  have  the  same  sign  as  the  50th  bit  650? 
Remark:  The  preceding  bit  pattern  creates  the  worst  possible  ISI  for  the  50th  bit.  Since  the  sine 
pulse  dies  off  slowly  with  time,  the  ISI  contributions  due  to  the  99  other  bits  to  the  50th  sample 
sum  up  to  a  number  larger  in  magnitude,  and  opposite  in  sign,  relative  to  the  contribution  due 
to  650.  A  decision  on  650  based  on  the  sign  of  rso  would  therefore  be  wrong.  This  sensitivity  to 
timing  error  is  why  the  sine  pulse  is  seldom  used  in  practice. 

(c)  Now,  consider  the  digitally  modulated  signal  in  (a)  with  the  pulse  s{t)  =  sinc(2fFt)sinc(fFt). 
For  ideal  sampling  as  in  (a),  what  are  the  two  values  of  T  such  that  =  6„? 

(d)  For  the  smaller  of  the  two  values  of  T  found  in  (c)  (which  corresponds  to  faster  signaling, 
since  the  symbol  rate  is  1/T),  repeat  the  computation  in  (b).  That  is,  hnd  rso  and  compare  its 
sign  with  650  for  the  bit  pattern  in  (b). 

(e)  Find  and  sketch  the  frequency  response  of  the  pulse  in  (c).  What  is  the  excess  bandwidth 
relative  to  the  pulse  in  (a),  assuming  Nyquist  signaling  at  the  same  symbol  rate? 

(f)  Discuss  the  impact  of  the  excess  bandwidth  on  the  severity  of  the  ISI  due  to  timing  mismatch. 
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Figure  4.19:  16QAM  constellation  with  scaling  chosen  for  convenient  computation  of  power 
efficiency. 


Power-bandwidth  tradeoffs 

Problem  4.15  (Power  efficiency  of  16QAM)  In  this  problem,  we  sketch  the  computation 
of  power  efficiency  for  the  16QAM  constellation  shown  in  Figure  4.19. 

(a)  Note  that  the  minimum  distance  for  the  particular  scaling  chosen  in  the  figure  is  dmin  =  2. 

(b)  Show  that  the  constellation  points  divide  into  3  categories  based  on  their  distance  from  the 
origin,  corresponding  to  squared  distances,  or  energies,  of  1^  +  1^,  1^  +  3^  and  3^  +  3^.  Averaging 
over  these  energies  (weighting  by  the  number  of  points  in  each  category),  show  that  the  average 
energy  per  symbol  is  =  10. 

(c)  Using  (a)  and  (b),  and  accounting  for  the  number  of  bits/symbol,  show  that  the  power 

efficiency  is  given  by  ?7p  =  =  |. 

Problem  4.16  (Power-bandwidth  tradeoffs)  A  16QAM  system  transmits  at  50  Mbps  using 
an  excess  bandwidth  of  50%.  The  transmit  power  is  100  mW. 

(a)  Assuming  that  the  carrier  frequency  is  5.2  GHz,  specify  the  frequency  interval  occupied  by 
the  passband  modulated  signal. 

(b)  Using  the  same  frequency  band  in  (a),  how  fast  could  you  signal  using  QPSK  with  the  same 
excess  bandwidth? 

(c)  Estimate  the  transmit  power  needed  in  the  QPSK  system,  assuming  the  same  range  and 
reliability  requirements  as  in  the  16QAM  system. 


Minimum  Shift  Keying 

Problem  4.17  (OQPSK  and  MSK)  Linear  modulation  with  a  bandlimited  pulse  can  perform 
poorly  over  nonlinear  passband  channels.  For  example,  the  output  of  a  passband  hardlimiter 
(which  is  a  good  model  for  power  amplifiers  operating  in  a  saturated  regime)  has  constant 
envelope,  but  a  PSK  signal  employing  a  bandlimited  pulse  has  an  envelope  that  passes  through 
zero  during  a  180  degree  phase  transition,  as  shown  in  Figure  4.20.  One  way  to  alleviate  this 
problem  is  to  not  allow  180  degree  phase  transitions.  Offset  QPSK  (OQPSK)  is  one  example  of 
such  a  scheme,  where  the  transmitted  signal  is  given  by 

QO 

s{t)  =  ^  K[n]p{t  -  nT)  +  ibs[n]p{t  -  nT  -  —)  (4.26) 

n=—oo 

where  {5c[’^]};  bs[n\  are  ±1  BPSK  symbols  modulating  the  I  and  Q  channels,  with  the  I  and  Q 
signals  being  staggered  by  half  a  symbol  interval.  This  leads  to  phase  transitions  of  at  most  90 
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Envelope  is  zero  due  to  180  degrees  phase  transition 


7 


Figure  4.20:  The  envelope  of  a  PSK  signal  passes  through  zero  during  a  180  degree  phase 
transition,  and  gets  distorted  over  a  nonlinear  channel. 


degrees  at  integer  multiples  of  the  hit  time  Ti,  =  Minimum  Shift  Keying  (MSK)  is  a  special 
case  of  OQPSK  with  timelimited  modulating  pulse 

p(t)  =  \/2sin(— )/[o,T](t)  (4.27) 

(a)  Sketch  the  I  and  Q  waveforms  for  a  typical  MSK  signal,  clearly  showing  the  timing  relationship 
between  the  waveforms. 

(b)  Show  that  the  MSK  waveform  has  constant  envelope  (an  extremely  desirable  property  for 
nonlinear  channels). 

(c)  Find  an  analytical  expression  for  the  PSD  of  an  MSK  signal,  assuming  that  all  bits  sent  are 
i.i.d.,  taking  values  ±1  with  equal  probability.  Plot  the  PSD  versus  normalized  frequency  fT. 

(d)  Find  the  99%  power  containment  normalized  bandwidth  of  MSK.  Compare  with  the  minimum 
Nyquist  bandwidth,  and  the  99%  power  containment  bandwidth  of  OQPSK  using  a  rectangular 
pulse. 

(e)  Recognize  that  Figure  4.6  gives  the  PSD  for  OQPSK  and  MSK,  and  reproduce  this  figure, 
normalizing  the  area  under  the  PSD  curve  to  be  the  same  for  both  modulation  formats. 


Orthogonal  signaling 

Problem  4.18  (FSK  tone  spacing)  Consider  two  real- valued  passband  pulses  of  the  form 

So{t)  =  cos{27rfot  +  0o)  0  <  f  <  T 
Si(f)  =  003(271 fit  +  (fi)  0  <  t  <T 

where  /i  >  /o  3>  l/T.  The  pulses  are  said  to  be  orthogonal  if  (so,si)  =  so(t)si(t)dt  =  0. 

(a)  If  00  =  01  =  0,  show  that  the  minimum  frequency  separation  such  that  the  pulses  are 
orthogonal  is  /i  —  /o  =  ^. 

(b)  If  00  and  0i  are  arbitrary  phases,  show  that  the  minimum  separation  for  the  pulses  to  be 
orthogonal  regardless  of  0o,  0i  is  /i  —  %  =  l/T. 

Remark:  The  results  of  this  problem  can  be  used  to  determine  the  bandwidth  requirements  for 
coherent  and  noncoherent  FSK,  respectively. 


Problem  4.19  (Walsh-Hadamard  codes) 

(a)  Specify  the  Walsh-Hadamard  codes  for  8-ary  orthogonal  signaling  with  noncoherent  reception. 

(b)  Plot  the  baseband  waveforms  corresponding  to  sending  these  codes  using  a  square  root  raised 
cosine  pulse  with  excess  bandwidth  of  50%. 

(c)  What  is  the  fractional  increase  in  bandwidth  efficiency  if  we  use  these  8  waveforms  as  building 
blocks  for  biorthogonal  signaling  with  coherent  reception? 
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Figure  4.21:  Baseband  signals  for  Problem  4 


Problem  4.20  The  two  orthogonal  baseband  signals  shown  in  Figure  4.21  are  used  as  building 
blocks  for  constructing  passband  signals  as  follows. 

Up{t)  =  a{t)  cos  271  fct  —  h{t)  sin  27r/ct 
Vp{t)  =  hit)  cos  277 fct  —  ait)  sin  27ifct 
Wpit)  =  bit)  cos  271  fct  +  ait)  sin  271  fj, 

Xpit)  =  ait)  cos27i fct  +  bit)  sin27r/ct 

where  /c  3>  1. 

(a)  True  or  False  The  signal  set  can  be  used  for  4-ary  orthogonal  modulation  with  coherent 
demodulation. 

(b)  True  or  False  The  signal  set  can  be  used  for  4-ary  orthogonal  modulation  with  noncoherent 
demodulation. 


Bandwidth  occupancy  as  a  function  of  modulation  format 

Problem  4.21  We  wish  to  send  at  a  rate  of  10  Mbits/sec  over  a  passband  channel.  Assum¬ 
ing  that  an  excess  bandwidth  of  50%  is  used,  how  much  bandwidth  is  needed  for  each  of  the 
following  schemes:  QPSK,  64-QAM,  and  64-ary  noncoherent  orthogonal  modulation  using  a 
Walsh-Hadamard  code. 

Problem  4.22  Consider  64-ary  orthogonal  signaling  using  Walsh-Hadamard  codes.  Assuming 
that  the  chip  pulse  is  square  root  raised  cosine  with  excess  bandwidth  25%,  what  is  the  bandwidth 
required  for  sending  data  at  20  Kbps  over  a  passband  channel  assuming  (a)  coherent  reception, 
(b)  noncoherent  reception. 


Software  Lab  4.1:  Linear  modulation  over  a  noiseless  ideal  channel 

This  is  the  hrst  of  a  sequence  of  software  labs  which  gradually  develop  a  reasonably  complete 
Matlab  simulator  for  a  linearly  modulated  system,  (the  follow-on  labs  are  in  Chapters  6  and  7) 

Background 

Figure  4.22  shows  block  diagrams  corresponding  to  a  typical  DSP-centric  realization  of  a  com¬ 
munication  transceiver  employing  linear  modulation.  In  the  labs,  we  model  the  core  components 
of  such  a  system  using  the  complex  baseband  representation,  as  shown  in  Figure  4.23.  Given  the 
equivalence  of  passband  and  complex  baseband,  we  are  only  skipping  the  modeling  of  hnite  pre¬ 
cision  effects  due  to  digital-to-analog  conversion  (DAC)  and  analog-to-digital  conversion  (ADC). 
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Figure  4.22:  Typical  DSP-centric  transceiver  realization.  Our  model  does  not  include  the  blocks 
shown  in  dashed  lines.  Finite  precision  effects  such  as  DAC  and  ADC  are  not  considered.  The 
upconversion  and  downconversion  operations  are  not  modeled.  The  passband  channel  is  modeled 
as  an  LTI  system  in  complex  baseband. 
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Figure  4.23:  Block  diagram  of  a  linearly  modulated  system,  modeled  in  complex  baseband. 
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These  effects  can  easily  be  incorporated  into  Matlab  models  snch  as  those  we  develop,  bnt  are 
beyond  onr  cnrrent  scope. 

A  few  points  worth  noting  abont  the  model  of  Fignre  4.23: 

Choice  of  transmit  filter:  The  PSD  of  the  transmitted  signal  is  proportional  to  |Gtx(/)P  (see 
Chapter  4).  The  choice  of  transmit  hlter  is  made  based  on  spectral  constraints,  as  well  as  con¬ 
siderations  snch  as  sensitivity  to  receiver  timing  errors  and  intersymbol  interference.  Typically, 
the  bandwidth  employed  is  of  the  order  of 

Channel  model:  We  typically  model  the  channel  as  an  linear  time-invariant  (LTI)  system.  For 
certain  applications,  snch  as  wireless  commnnications,  the  channel  may  be  modeled  as  slowly 
time  varying. 

Noise  model:  Noise  is  introdnced  in  a  later  lab  (in  Chapter  6). 

Receive  filter  and  sampler:  The  optimal  choice  of  receive  hlter  is  actnally  a  hlter  matched  to 
the  cascade  of  the  transmit  hlter  and  the  channel.  In  this  case,  there  is  no  information  loss 
in  sampling  the  ontpnt  of  the  receive  hlter  at  the  symbol  rate  Often,  however,  we  nse  a 
snboptimal  choice  of  receive  hlter  (e.g.,  a  wideband  hlter  hat  over  the  signal  band,  or  a  hlter 
matched  to  the  transmit  hlter).  In  this  case,  it  is  typically  advantageons  to  sample  faster  than 
the  symbol  rate.  In  general,  we  assnme  that  the  sampler  operates  at  rate  where  m  is  a  positive 
integer.  The  ontpnt  of  the  sampler  is  then  processed,  typically  nsing  digital  signal  processing 
(DSP),  to  perform  receiver  fnnctions  snch  as  synchronization,  eqnalization  and  demodnlation.. 

The  simnlation  of  a  linearly  modnlated  system  typically  involves  the  following  steps. 

Step  1:  Generating  random  symbols  to  be  sent 

We  restrict  attention  in  this  lab  to  Binary  Phase  Shift  Keying  (BPSK).  That  is,  the  symbols 
{hn}  in  Fignre  1  take  valnes  ±1. 

Step  2:  Implementing  the  transmit,  channel,  and  receive  hlters 

Since  the  bandwidth  of  these  hlters  is  of  the  order  of  they  can  be  accnrately  implemented  in 
DSP  by  using  FIR  hlters  operating  on  samples  at  a  rate  which  is  a  suitable  multiple  of  The 
default  choice  of  sampling  rate  in  the  labs  is  unless  specihed  otherwise.  If  the  hlter  is  specihed 
in  continuous  time,  typically,  one  simply  samples  the  impulse  response  at  rate  taking  a  large 
enough  hlter  length  to  capture  most  of  the  energy  in  the  impulse  response.  Code  fragment  4.B.1 
in  the  appendix  illustrates  generating  a  discrete  time  hlter  corresponding  to  a  truncated  raised 
cosine  pulse. 

Step  3:  Sending  the  symbols  through  the  hlters. 

To  send  symbols  at  rate  ^  through  hlters  implemented  at  rate  it  is  necessary  to  upsample 
the  symbols  before  convolving  them  with  the  hlter  impulse  response  determined  in  Step  2.  Code 
fragment  4.B.2  in  the  appendix  illustrates  this  for  a  raised  cosine  pulse. 

Step  f:  Adding  noise 

Typically,  we  add  white  Gaussian  noise  (model  to  be  specihed  in  a  later  lab)  at  the  input  to  the 
receive  hlter. 

Step  5:  Processing  at  the  receive  hlter  output 

If  there  is  no  intersymbol  interference  (ISI),  the  processing  simply  consists  of  sampling  at  rate  ^ 
to  get  decision  statistics  for  the  symbols  of  interest.  For  BPSK,  you  might  simply  take  the  sign 
of  the  decision  statistic  to  make  your  bit  decision. 

If  the  ISI  is  signihcant,  then  channel  equalization  (discussed  in  a  later  lab)  is  required  prior  to 
making  symbol  decisions. 

Laboratory  Assignment 
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See  Appendix  4.B  for  information  that  may  be  usefnl  for  this  lab. 

0)  Write  a  Matlab  fnnction  analogous  to  Code  Fragment  4.B.1  to  generate  an  SRRC  pulse  (i.e., 
do  Problem  4.13(a))  where  you  can  specify  the  truncation  length  and  the  excess  bandwidth. 

1)  Set  the  transmit  hlter  to  an  SRRC  pulse  with  excess  bandwidth  22%,  sampled  at  rate  4/T 
and  truncated  to  [— 5T,  5T].  Plot  the  impulse  response  of  the  transmit  hlter  versus  t/T. 

If  you  have  difficulty  generating  the  SRRC  pulse,  use  the  following  code  fragment  to  generate 
the  transmit  hlter: 

Code  Fragment  4.7.1  (Explicit  specification  of  transmit  filter) 

7ofirst  specify  half  of  the  filter 

hhalf  =  [-0 . 025288315 ; -0 . 034167931 ; -0 . 035752323 ; -0 . 016733702 ; 0 . 021602514 ; 

0 . 064938487 ; 0 . 091002137 ; 0 . 081894974; 0 . 037071157 ; -0 . 021998074; -0 . 060716277  ; 

-0 . 051178658 ; 0 . 007874526 ; 0 . 084368728 ; 0 . 126869306 ; 0 . 094528345 ; -0 . 012839661 ; 

-0 . 143477028 ; -0 . 211829088 ; -0 . 140513128 ; 0 . 094601918 ; 0 . 441387140 ; 0 . 785875640 ; 

1.0]  ; 

transmit.! liter  =  [hhalf ;flipud(hhalf)] ; 

2)  Using  the  DFT  (as  in  Code  Fragment  2.5.1  for  Example  2.5.4),  compute  the  magnitude  of 
the  transfer  function  of  the  transmit  hlter  versus  the  normalized  frequency  fT  (make  sure  the 
resolution  in  frequency  is  good  enough  to  get  a  smooth  plot,  e.g.,  at  least  as  good  as  ^).  From 
eyeballing  the  plot,  check  whether  the  normalized  bandwidth  (i.e.,  bandwidth  as  a  multiple  of 
^)  is  well  predicted  by  the  nominal  excess  bandwidth. 

3)  Use  the  transmit  hlter  in  the  Code  Fragment  4.B.2,  which  implements  upsampling  and  allows 
sending  a  programmable  number  of  symbols  through  the  system.  Set  the  receive  hlter  to  be  the 
matched  hlter  corresponding  to  the  transmit  hlter,  and  plot  the  response  at  the  output  of  the 
receive  hlter  to  a  single  symbol.  Is  the  cascade  of  the  transmit  and  receive  hlters  is  Nyquist  at 
rate  l/T? 

4)  Generate  100  random  bits  {a[n]}  taking  values  in  {0,1},  and  map  them  to  symbols  {^[n]} 
taking  values  in  {—1,  +1},  with  0  mapped  to  +1  and  1  to  —1. 

5)  Send  the  100  symbols  {^[n]}  through  the  system.  What  is  the  length  of  the  corresponding 
output  of  the  transmit  hlter?  What  is  the  length  of  the  corresponding  output  of  the  receive 
hlter?  Plot  separately  the  input  to  the  receive  hlter,  and  the  output  of  the  receive  hlter  versus 
time,  with  one  unit  of  time  on  the  x-axis  equal  to  the  symbol  time  T. 

6)  Do  the  best  job  you  can  in  recovering  the  transmitted  bits  {n[n.]}  by  directly  sampling  the 
input  to  the  receive  hlter,  and  add  lines  in  the  matlab  code  for  implementing  your  idea.  That 
is,  select  a  set  of  100  samples,  and  estimate  the  100  transmitted  bits  based  on  the  sign  of  these 
samples.  (What  sampling  delay  and  spacing  would  you  use?).  Estimate  the  probability  of  error 
(note:  no  noise  has  been  added). 

7)  Do  the  best  job  you  can  in  recovering  the  transmitted  bits  by  directly  sampling  the  output  of 
the  receive  hlter,  and  add  lines  in  the  Matlab  code  for  implementing  your  idea.  That  is,  select  a 
set  of  100  samples,  estimate  the  100  transmitted  bits  based  on  the  sign  of  these  samples.  (What 
sampling  delay  and  spacing  would  you  use?).  Estimate  the  probability  of  error.  Also  estimate 
the  probability  of  error  if  you  chose  an  incorrect  delay,  ohset  from  the  correct  delay  by  T/2. 

8)  Suppose  that  the  receiver  LO  used  for  downconversion  is  ahead  in  frequency  and  phase  relative 
to  the  incoming  wave  by  A/  =  ^  and  a  phase  of  7r/2.  Modify  your  complex  baseband  model 
to  include  the  ehects  of  the  carrier  phase  and  frequency  ohset.  When  you  now  sample  at  the 
“correct”  delay  as  determined  in  7),  do  a  scatter  plot  of  the  complex-valued  samples  {y[n\,n  = 
1, ...,  100}  that  you  obtain.  Can  you  make  correct  decisions  based  on  taking  the  sign  of  the  real 
part  of  the  samples,  as  in  7)? 
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9)  Now  consider  a  differentially  encoded  system  in  which  we  send  {a[n],n  =  where 

a[n]  G  {0, 1},  by  sending  the  following  ±1  bits:  6[1]  =  +1,  and  for  n  =  2, 100 

=  11^?’ 

'•  ^  1^  —b[n  —  IJ,  a[n\  =  1, 

Devise  estimates  for  the  bits  {a[n]}  from  the  samples  {|/[n]}  in  8),  and  estimate  the  probability 
of  error. 

Hint:  What  does  y[n]y*[n  —  1]  look  like? 

Lab  Report:  Your  lab  report  should  answer  the  preceding  questions  in  order,  and  should  document 
the  reasoning  you  used  and  the  difficulties  you  encountered.  Comment  on  whether  you  get  better 
error  probability  in  6)  or  7),  and  why? 


4.  A  Power  spectral  density  of  a  linearly  modulated  signal 

We  wish  to  compute  the  PSD  of  a  linearly  modulated  signal  of  the  form 

n(t)  =  5:  h[n]p{t  —  nT) 

n 

While  we  model  the  complex- valued  symbol  sequence  {&[n]}  as  random,  we  do  not  need  to 
invoke  concepts  from  probability  and  random  processes  to  compute  the  PSD,  but  can  simply 
model  time-averaged  quantities  for  the  symbol  sequence.  For  example,  the  DC  value,  which  is 
typically  designed  to  be  zero,  is  defined  by 


b[n] 


lim 

N^OO 


1 

2N  +  1 


N 

n=-N 


(4.28) 


We  also  define  the  time-averaged  autocorrelation  function  Rb[k]  =  b[n]b*[n  —  k]  for  the  symbol 
sequence  as  the  following  limit: 


Rb[k[=  jin^  ^  ^  b[n]b*  [n  -  k]  (4.29) 

n=—N 

Note  that  we  are  being  deliberately  sloppy  about  the  limits  of  summation  in  n  on  the  right-hand 
side  to  avoid  messy  notation.  Actually,  since  —N  <  m  =  n  —  k  <  N,  we  have  the  constraint 
—N  +  k<n<N  +  k  in  addition  to  the  constraint  —N  <  n  <  N.  Thus,  the  summation  in 
n  should  depend  on  the  delay  k  at  which  we  are  evaluating  the  autocorrelation  function,  going 
from  n  =  —N  to  n  =  N  +  k  for  k  <  0,  and  n  =  —N  +  k  to  n  =  N  for  /c  >  0.  However,  we  ignore 
these  edge  effects,  since  become  negligible  when  we  let  N  get  large  while  keeping  k  hxed. 

We  now  compute  the  time-averaged  PSD.  As  described  in  Section  4.2.1,  the  steps  for  computing 
the  PSD  for  a  finite-power  signal  u{f)  are  as  follows: 

(a)  timelimit  to  a  hnite  observation  interval  of  length  to  get  a  hnite  energy  signal  UT^{t)-, 

(b)  compute  the  Fourier  transform  UT^{f),  and  hence  obtain  the  energy  spectral  density  |Cto(/)|^; 

(c)  estimate  the  PSD  as  Su{f)  =  ,  and  take  the  limit  Tq  ^  oo  to  obtain  Su{f)- 

Consider  the  observation  interval  [—NT,  NT],  which  fits  roughly  2N  symbols.  In  general,  the 
modulation  pulse  p{t)  need  not  be  timelimited  to  the  symbol  duration  T.  However,  we  can 
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neglect  the  edge  effects  caused  by  this,  since  we  eventually  take  the  limit  as  the  observation 
interval  gets  large.  Thus,  we  can  write 


N 

UtM  ~  E  b[n\p{t  —  nT) 

n=-N 

Taking  the  Fourier  transform,  we  obtain 

N 

UtSJ)  =  E 

n=-N 

The  energy  spectral  density  is  therefore  given  by 

N  N 

n=—N  m=—N 

where  we  need  to  use  two  different  dummy  variables,  n  and  m,  for  the  summations  corresponding 
to  UTo{f)  and  respectively.  Thus, 

N  N 

\UTAf)\^  =  \P{f)?  E  E 

m=—N n=—N 


and  the  PSD  is  estimated  as 


Suif) 


2NT 


imi%  1 

T  ^2N 


N  N 

m=—N  n=—N 


(4.30) 


I  p(  f)\^ 

Thus,  the  PSD  factors  into  two  components:  the  first  is  a  term  '  '  that  depends  only  on  the 

spectrum  of  the  modulation  pulse  p{t),  while  the  second  term  (in  curly  brackets)  depends  only 
on  the  symbol  sequence  {&[n]}.  Let  us  now  work  on  simplifying  the  latter.  Grouping  terms  of 
the  form  m  =  n  —  k  for  each  fixed  k,  we  can  rewrite  this  term  as 

^  N  N  1  ^ 

m=—N  n=—N  k  n=—N 


From  (4.29),  we  see  that  taking  the  limit  — )■  oo  in  (4.31)  yields  J2k^b[k]e  Substituting 

into  (4.30),  we  obtain  that  the  PSD  is  given  by 

SuU)  =  E  (4.32) 

k 

Thus,  we  see  that  the  PSD  depends  both  on  the  modulating  pulse  p(t)  and  on  the  properties 
of  the  symbol  sequence  {fe[n]}.  We  explore  how  the  dependence  on  the  symbol  sequence  can 
be  exploited  for  shaping  the  spectrum  in  the  problems.  However,  for  most  systems,  the  symbol 
sequence  can  be  modeled  as  uncorrelated  and  zero  mean.  In  this  case,  Rb[k]  =  0  for  /c  7^  0. 
Specializing  to  this  important  setting  yields  Theorem  4.2.1. 
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4.B  Simulation  resource:  bandlimited  pulses  and  upsam¬ 
pling 


The  discussion  in  this  appendix  should  be  helpful  for  Software  Lab  4.1.  In  order  to  simulate  a 
linearly  modulated  system,  we  must  specify  the  transmit  and  receive  filters,  typically  chosen  so 
that  their  cascade  is  Nyquist  at  the  symbol  rate.  As  mentioned  earlier,  there  are  two  popular 
choices.  One  choice  is  to  set  the  transmit  hlter  to  a  Nyquist  pulse,  and  the  receive  filter  to  a 
wideband  pulse  that  has  response  roughly  flat  over  the  band  of  interest.  Another  is  to  set  the 
transmit  and  receive  filters  to  be  square  roots  (in  the  frequency  domain)  of  a  Nyquist  pulse.  We 
discuss  software  implementations  of  both  choices  here. 

Consider  the  raised  cosine  pulse,  which  is  the  most  common  choice  for  bandlimited  Nyquist  pulses. 
Setting  the  symbol  rate  1/T  =  1  without  loss  of  generality  (this  is  equivalent  to  expressing  all 
results  in  terms  of  t/T  or  /T),  this  pulse  is  given  by 


1— a 


'1,  0<|/|<  2 

/>(/)  =  <  qi  +  COs(j(|/|-if!))]  , 

0  ,  else 


~2' 


(4.33) 


The  corresponding  time  domain  pulse  is  given  by 


/X  /X  cos  not  , ,  . 

pit)  =  sinc(t)  (4-34) 

where  0  <  aT  denotes  the  excess  bandwidth.  When  generating  a  sampled  version  of  this  pulse, 
we  must  account  for  the  zero  in  the  denominator  at  t  =  ±A.  An  example  Matlab  function  for 
generating  a  sampled  version  of  the  raised  cosine  pulse  is  provided  below.  Note  that  the  code 
must  account  for  the  zero  in  the  denominator  at  t  =  ±A.  It  is  left  as  an  exercise  to  show,  using 
L’Hospital’s  rule,  that  the  0/0  form  taken  by  at  these  times  evaluates  to 

Code  Fragment  4.B.1  (Sampled  raised  cosine  pulse) 

7„tinie  domain  pulse  for  raised  cosine,  together  with  time  vector  to  plot  it  against 
y„oversampling  factor=  how  much  faster  than  the  symbol  rate  we  sample  at 
7olength=where  to  truncate  response  (multiple  of  symbol  time)  on  each  side  of  peak 
7„a  =  excess  bandwidth 

function  [rc ,time_axis]  =  raised_cosine (a, m, length) 

length_os  =  floor (length*m)  ;  7onumber  of  samples  on  each  side  of  peak 
7otime  vector  (in  units  of  symbol  interval)  on  one  side  of  the  peak 
z  =  cumsum(ones(length_os, l))/m; 

A=  sin(pi*z)  . /(pi*z)  ;  7„term  1 
B=  cos(pi*a*z);  7oterm  2 
C=  1  -  (2*a*z).~2;  7oterm  3 

zerotest  =  m/ (2*a) ;  7olocation  of  zero  in  denominator 
7oCheck  whether  any  sample  coincides  with  zero  location 
if  (zerotest  ==  floor (zerotest) ) , 

B(zerotest)  =  pi*a; 

C(zerotest)  =  4*a; 

7oalternative  is  to  perturb  around  the  sample 
7o(find  Hospital  limit  numerically) 
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7oB(zerotest)  =  cos  (pi*a*  (zerotest+0 . 001) )  ; 

°/oC(zerotest)  =  l-(2*a*(zerotest+0.001))~2 
end 

D  =  (A.*B)./C;  "/oresponse  to  one  side  of  peak 
rc  =  [flipud(D)  ;  1;D]  ;  7„add  in  peak  and  other  side 
time_axis  =  [f lipud(-z) ; 0 ; z] ; 

This  can,  for  example,  be  used  to  generate  a  plot  of  the  raised  cosine  pulse,  as  follows,  where  we 
would  typically  oversample  by  a  large  factor  (e.g.,  m  =  32)  in  order  to  get  a  smooth  plot. 

7o°/oplot  time  domain  raised  cosine  pulse 

a  =  0.5;  7o  desired  excess  bandwidth 

m  =  32;  7oOversample  by  a  lot  to  get  smooth  plot 

length  =  10;  7o  where  to  truncate  the  time  domain  response  (one-sided,  multiple  of  symbol  tim 
[rc,time]  =  raised_cosine(a,M, length) ; 
plot (time, rc) ; 

The  code  for  the  raised  cosine  function  can  also  be  used  to  generate  the  coefficients  of  a  dis¬ 
crete  time  transmit  filter.  Here,  the  oversampling  factor  would  be  dictated  by  our  DSP-centric 
implementation,  and  would  usually  be  far  less  than  what  is  required  for  a  smooth  plot:  the 
digital-to-analog  converter  would  perform  the  interpolation  required  to  provide  a  smooth  analog 
waveform  for  upconversion.  A  typical  choice  is  m  =  4,  as  in  the  Matlab  code  below  for  generating 
a  noiseless  BPSK  modulated  signal. 

Upsampling:  As  noted  in  our  preview  of  digital  modulation  in  Section  2.3.2,  the  symbols  come 
in  every  T  seconds,  while  the  samples  of  the  transmit  filter  are  spaced  by  T/m.  For  example, 
the  nth  symbol  contributes  h[n\p{t  —  nT)  to  the  transmit  filter  output,  and  the  (n  -|-  l)sf  symbol 
contributes  h[n  +  l]p{t  —  (n  +  1)T).  Since  p{t  —  nT)  and  pit  —  {n  +  1)T)  are  offset  by  T,  they 
must  be  offset  by  m  samples  when  sampling  at  a  rate  of  m/T.  Thus,  if  the  symbols  are  input  to 
a  transmit  filter  whose  discrete  time  impulse  response  is  expressed  at  sampling  rate  m/T,  then 
successive  symbols  at  the  input  to  the  filter  must  be  spaced  by  m  samples.  That  is,  in  order  to 
get  the  output  as  a  convolution  of  the  symbols  with  the  transmit  filter  expressed  at  rate  m/T, 
we  must  insert  m  —  1  zeros  between  successive  symbols  to  convert  them  to  a  sampling  rate  of 
m/T. 

For  completeness,  we  reproduce  part  of  the  upsampling  Code  Fragment  2.3.2  below  in  imple¬ 
menting  a  raised  cosine  transmit  filter. 

Code  Fragment  4.B.2  (Sampled  transmitter  output) 

oversampling_f actor  =  4; 
m  =  oversampling.f actor ; 

7oparameters  for  sampled  raised  cosine  pulse 
a  =  0.5; 

length  =  10;7o  (truncated  outside  [-length*!, length*!] ) 

7oraised  cosine  transmit  filter  (time  vector  set  to  a  dummy  variable  which  is  not  used) 
[transmit_filter, dummy]  =  raised_cosine (a, m, length) ; 

7.NUMBER  OF  SYMBOLS 
nsymbols  =  100; 

7.BPSK  SYMBOL  GENERAIION 

symbols  =  sign(rand(nsymbols , 1)  -.5); 

7.UPSAMPLE  BY  m 

nsymbols_upsampled  =  l+(nsymbols-l)*m;7olength  of  upsampled  symbol  sequence 
symbols_upsampled  =  zeros (nsymbols_upsampled,  1)  ;7oinitialize 
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symbols_upsainpled(l  :m:nsymbols_upsainpled)=symbols;yoinsert  symbols  with  spacing  m 
'/.NOISELESS  MODULATED  SIGNAL 

tx_output  =  conv(symbols_upsampled,transmit_f ilter) ; 


Let  us  now  discuss  the  implementation  of  an  alternative  transmit  filter,  the  square  root  raised 
cosine  (SRRC).  The  frequency  domain  SRRC  pulse  is  given  by  G{f)  =  \/ P{f),  where  P{f)  is 
as  in  (4.33).  We  now  need  to  hnd  a  sampled  version  of  the  time  domain  pulse  g{t)  in  order  to 
implement  linear  modulation  as  above.  While  this  could  be  done  numerically  by  sampling  the 
frequency  domain  pulse  and  computing  an  inverse  DFT,  we  can  also  hnd  an  analytical  formula 
for  g{t),  as  follows.  Given  the  practical  importance  of  the  SRRC  pulse,  we  provide  the  formula 
and  sketch  its  derivation.  Noting  that  1  +  cos  theta  =  2cos^6*,  we  can  rewrite  the  frequency 
domain  expression  (4.33)  for  the  raised  cosine  pulse  as 


Pif) 


1 , 

0<l/l<^ 

(4.35) 

0, 

else 

We  can  now  take  the  square  root  to  get  an  analytical  expression  for  the  SRRC  pulse  in  the 
frequency  domain  as  follows: 


G{f)  = 


1  , 

0<l/l<i=i 

cod77(l/l-¥)) 

0 , 

else 

Frequency  domain  SRRC  pulse 


(4.36) 

Finding  the  time  domain  SRRC  pulse  is  now  a  matter  of  computing  the  inverse  Fourier  transform. 
Since  it  is  also  an  interesting  exercise  in  utilizing  Fourier  transform  properties,  we  sketch  the 
derivation.  First,  we  break  up  the  frequency  domain  pulse  into  segments  whose  inverse  Fourier 
transforms  are  well  known.  Setting  b  =  we  have 


G(/)  =  Gi(/)  +  G2(/)  (4.37) 

where 

^  ^  ,  ,  ,  ,  ,  sin(27r6t)  sin7r(I  — a)t 

Gi(/)  =  I[-b,b]{f)  ^  9iit)  =  2fc  smc(26t)  = - - - = - - -  (4.38) 

TTl  TTl 

and 

G2U)  =  u{f  -b)  +  U{-f  -  b)  (4.39) 

with 

U{f)  =  cos  (^/)  I[o,a]if)  =  ^  (4.40) 

To  evaluate  (?2(/),  note  hrst  that 

-f[o,a](/)  -t-t=  a  sinc(af)  (4.41) 

Multiplication  by  in  the  frequency  domain  corresponds  to  leftward  time  shift  by  while 
multiplication  by  corresponds  to  a  rightward  time  shift  by  -P.  From  (4.40)  and  (4.41),  we 

therefore  obtain  that 


U{f)  =  cos  (^/)  I[o,a]if)  ^ 


a 

-sine 


gjTraU-i) 
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Simplifying,  we  obtain  that 


Now, 


u{t)  = 


a  —  jSat 

TT  1  —  lQa?t'^ 


(4.42) 


G^if)  =  U{f  -b)  +  U{-f  -  6)  yy  g2{t)  =  =  Re  {2u{t)e^^^^^)  (4.43) 

Plugging  in  (4.42),  and  substituting  the  value  of  fe  =  we  obtain  upon  simplihcation  that 


TT 


1  -  1602^2 


Taking  the  real  part,  we  obtain 


1  4a  cos(7r(l  +  a)t)  +  16a2t  sin(7r(l  —  a)t) 

^  n  1  -  16a2t2 


(4.44) 


Combining  (4.38)  and  (4.44)  and  simplifying,  we  obtain  the  following  expression  for  the  SRRC 
pulse  g{t)  =  gi{t)  +  g2{t): 


9{t) 


4acos(7r(l  +  a)t)  +  ""("(;~°^*^ 

7r(l  —  16a2t2) 


Time  domain  SRRC  pulse 


(4.45) 


We  leave  it  as  an  exercise  to  write  Matlab  code  to  generate  a  sampled  version  of  the  SRRC  pulse 
(analogous  to  Code  Fragment  4.B.1),  taking  into  account  the  zeros  in  the  denominator.  This 
can  then  be  used  to  generate  a  noiseless  transmit  waveform  as  in  Code  Fragment  4.B.2  simply 
by  replacing  the  transmit  hlter  by  an  SRRC  pulse. 
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Chapter  5 

Probability  and  Random  Processes 


Probability  theory  is  fundamental  to  communication  system  design,  especially  for  digital  commu¬ 
nication.  Not  only  are  there  uncontrolled  sources  of  uncertainty  such  as  noise,  interference,  and 
other  channel  impairments  that  are  only  amenable  to  statistical  modeling,  but  the  very  notion  of 
information  underlying  digital  communication  is  based  on  uncertainty.  In  particular,  the  receiver 
in  a  communication  system  does  not  know  a  priori  what  the  transmitter  is  sending  (otherwise 
the  transmission  would  be  pointless),  hence  the  receiver  designer  must  employ  statistical  models 
for  the  transmitted  signal.  In  this  chapter,  we  review  basic  concepts  of  probability  and  random 
variables  with  examples  motivated  by  communications  applications.  We  also  introduce  the  con¬ 
cept  of  random  processes,  which  are  used  to  model  both  signals  and  noise  in  communication 
systems. 

Chapter  Plan:  The  goal  of  this  chapter  is  to  develop  the  statistical  modeling  tools  required  in 
later  chapters.  Sections  5.1  through  5.5  provide  a  review  of  background  material  on  probability 
and  random  variables.  Section  5.1  discusses  basic  concepts  of  probability:  the  most  important 
of  these  for  our  purpose  are  the  concepts  of  conditional  probability  and  Bayes’  rule.  Sections  5.2 
and  5.4  discuss  random  variables  and  functions  of  random  variables.  Multiple  random  variables, 
or  random  vectors,  are  discussed  in  Section  5.3.  Section  5.5  discusses  various  statistical  averages 
and  their  computation.  Material  which  is  not  part  of  the  assumed  background  starts  with  Section 
5.6;  this  section  goes  in  depth  into  Gaussian  random  variables  and  vectors,  which  play  a  critical 
role  in  the  mathematical  modeling  of  communication  systems.  Section  5.7  introduces  random 
processes  in  sufficient  depth  that  we  can  describe,  and  perform  elementary  computations  with, 
the  classical  white  Gaussian  noise  (WGN)  model  in  Section  5.8.  At  this  point,  zealous  followers 
of  a  “just  in  time”  philosophy  can  move  on  to  the  discussion  of  optimal  receiver  design  in  Ghapter 
6.  However,  many  others  might  wish  to  go  through  one  more  section  Section  5.9,  which  provides 
a  more  general  treatment  of  the  effect  of  linear  operations  on  random  processes.  The  results  in 
this  section  allow  us,  for  example,  to  model  noise  correlations  and  to  compute  quantities  such  as 
signal-to- noise  ratio  (SNR).  Material  which  we  do  not  build  on  in  later  chapters,  but  which  may  be 
of  interest  to  some  readers,  is  placed  in  the  appendices:  this  includes  limit  theorems,  qualitative 
discussion  of  noise  mechanisms,  discussion  of  the  structure  of  passband  random  processes,  and 
quantihcation,  via  SNR  computations,  of  the  effect  of  noise  on  analog  modulation. 


5.1  Probability  Basics 

In  this  section,  we  remind  ourselves  of  some  important  dehnitions  and  properties. 

Sample  Space:  The  starting  point  in  probability  is  the  notion  of  an  experiment  whose  outcome 
is  not  deterministic.  The  set  of  all  possible  outcomes  from  the  experiment  is  termed  the  sample 
space  G.  For  example,  the  sample  space  corresponding  to  the  throwing  of  a  six-sided  die  is 
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=  {1,  2,  3, 4,  5,  6}.  An  analogous  example  which  is  well-suited  to  our  purpose  is  the  sequence 
of  bits  sent  by  the  transmitter  in  a  digital  communication  system,  modeled  probabilistically  by 
the  receiver.  For  example,  suppose  that  the  transmitter  can  send  a  sequence  of  seven  bits,  each 
taking  the  value  0  or  1.  Then  our  sample  space  consists  of  the  2^  =  128  possible  bit  sequences. 

Event:  Events  are  sets  of  possible  outcomes  to  which  we  can  assign  a  probability.  That  is,  an 
event  is  a  subset  of  the  sample  space.  For  example,  for  a  six-sided  die,  the  event  {1,  3,  5}  is  the 
set  of  odd-numbered  outcomes. 


Figure  5.1:  Basic  set  operations. 


We  are  often  interested  in  probabilities  of  events  obtained  from  other  events  by  basic  set  opera¬ 
tions  such  as  complementation,  unions  and  intersections;  see  Figure  5.1. 

Complement  of  an  Event  (“NOT”):  For  an  event  A,  the  complement  (“not  A”),  denoted 
by  A^,  is  the  set  of  outcomes  that  do  not  belong  to  A. 

Union  of  Events  (“OR”):  The  union  of  two  events  A  and  B,  denoted  by  A  U  R,  is  the  set 
of  all  outcomes  that  belong  to  either  A  or  B.  The  term  ”or”  always  refers  to  the  inclusive  or, 
unless  we  specify  otherwise.  Thus,  outcomes  belonging  to  both  events  are  included  in  the  union. 

Intersection  of  Events  (“AND”):  The  intersection  of  two  events  A  and  B,  denoted  by  AnR, 
is  the  set  of  all  outcomes  that  belong  to  both  A  and  B. 

Mutually  Exclusive,  or  Disjoint,  Events:  Events  A  and  B  are  mutually  exclusive,  or  disjoint, 
if  their  intersection  is  empty:  A  fl  i?  =  0. 

Difference  of  Events:  The  difference  A  \  R  is  the  set  of  all  outcomes  that  belong  to  A  but  not 
to  R.  In  other  words,  A  \  R  =  A  fl  B'^. 

Probability  Measure:  A  probability  measure  is  a  function  that  assigns  probability  to  events. 
Some  properties  are  as  follows. 

Range  of  probability:  For  any  event  A,  we  have  0  <  P[A]  <  1.  The  probability  of  the  empty 
set  is  zero:  P[0]  =  0.  The  probabilty  of  the  entire  sample  space  is  one:  P[f2]  =  1. 

Probabilities  of  disjoint  events  add  up:  If  two  events  A  and  B  are  mutually  exclusive,  then 
the  probability  of  their  union  equals  the  sum  of  their  probabilities. 

P[AAB]  =  P[A]  +  P[B]  ifAnP  =  0  (5.1) 
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By  mathematical  induction,  we  can  infer  that  the  probability  of  the  union  of  a  hnite  number  of 
pairwise  disjoint  events  also  adds  up.  It  is  useful  to  review  the  principle  of  mathematical  induction 
via  this  example.  Specihcally,  suppose  that  we  are  given  pairwise  disjoint  events  Ai,  ^2,  A3, .... 
We  wish  to  prove  that,  for  any  n  >  2, 


P[Ai  U  A2  U  ...  U  A„]  =  P[Ai]  +  ...  +  P[A„]  if  Ai  n  Aj  =  0  for  all  i  ^  j  (5.2) 

Mathematical  induction  consists  of  the  following  steps: 

(a)  verify  that  the  result  is  true  for  the  initial  value  of  n,  which  in  our  case  is  n  =  2; 

(b)  assume  that  the  result  is  true  for  an  arbitrary  value  oi  n  =  k] 

(c)  use  (a)  and  (b)  to  prove  that  the  result  is  true  for  n  =  fc  +  1. 

In  our  case,  step  (a)  does  not  require  any  work;  it  holds  by  virtue  of  our  assumption  of  (5.1). 
Now,  assume  that  (5.2)  holds  for  n  =  k.  Now, 


Ai  U  A2  U  ...  U  Afc  U  Afc+i  =  P  U  Afc+i 


where 

P  =  Ai  U  A2  U  ...  U  Afc 

and  Afc_|_i  are  disjoint.  We  can  therefore  conclude,  using  step  (a),  that 

P[PuAfc+i]  =P[P]  +  P[Afc+i] 

But  using  step  (b),  we  know  that 

P[P]  =  P[Ai  U  A2  U  ...  U  Afc]  =  P[Ai]  +  ...  +  P[Afc] 
We  can  now  conclude  that 


P[Ai  U  A2  U  ...  U  Afc  U  Afc_|_i]  —  P[Ai]  +  ...  +  P[Afc_|_i] 


thus  accomplishing  step  (c). 

The  preceding  properties  are  typically  stated  as  axioms,  which  provide  the  starting  point  from 
which  other  properties,  some  of  which  are  stated  below,  can  be  derived. 

Probability  of  the  complement  of  an  event:  The  probabilities  of  an  event  and  its  comple¬ 
ment  sum  to  one.  By  dehnition,  A  and  A^  are  disjoint,  and  A  U  A'^  =  fl.  Since  P[f2]  =  1,  we  can 
now  apply  (5.1)  to  infer  that 

P[A]  +  P[A"]  =  1  (5.3) 

Probabilities  of  unions  and  intersections:  We  can  use  the  property  (5.1)  to  infer  the  fol¬ 
lowing  property  regarding  the  union  and  intersection  of  arbitrary  events: 


P[Ai  U  A2]  =  P[Ai]  +  P[A2]  -  P[Ai  n  A2]  (5.4) 

Let  us  get  a  feel  for  how  to  use  the  probability  axioms  by  proving  this.  We  break  Ai  U  A2  into 
disjoint  events  as  follows: 

Ai  U  A2  =  A2  U  (Ai  \  A2) 

Applying  (5.1),  we  have 

P[Ai  U  A2]  =  P[A2]  +  P[Ai  \  A2]  (5.5) 

Furthermore,  since  Ai  can  be  written  as  the  disjoint  union  Ai  =  (Ai  n  A2)  U  (Ai  \  A2),  we  have 
P[Ai]  =  P[Ai  n  A2]  -f  P[Ai  \  A2],  or  P[Ai  \  A2]  =  P[Ai]  -  P[Ai  fl  A2].  Plugging  into  (5.5),  we 
obtain  (5.4). 
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Conditional  probability:  The  conditional  probability  of  A  given  B  is  the  probability  of  A 
assuming  that  we  already  know  that  the  outcome  of  the  experiment  is  in  B.  Outcomes  corre¬ 
sponding  to  this  probability  must  therefore  belong  to  the  intersection  AflB.  We  therefore  dehne 
the  conditional  probability  as 

P[A  n  B] 

P\B] 

(We  assume  that  P[B]  >  0,  otherwise  the  condition  we  are  assuming  cannot  occur.) 

Conditional  probabilities  behave  just  the  same  as  regular  probabilities,  since  all  we  are  doing  is 
restricting  the  sample  space  to  the  event  being  conditioned  on.  Thus,  we  still  have  P[y4|i?]  = 
1  -  P[A<^\B]  and 

P[Ai  U  A2\B]  =  P[Ai\B]  +  P[A2\B]  -  P[Ai  n  A2\B] 

Conditioning  is  a  crucial  concept  in  models  for  digital  communication  systems.  A  typical  appli¬ 
cation  is  to  condition  on  the  which  of  a  number  of  possible  transmitted  signals  is  sent,  in  order 
to  describe  the  statistical  behavior  of  the  communication  medium.  Such  statistical  models  then 
form  the  basis  for  receiver  design  and  performance  analysis. 


Example  5.1.1  (a  binary  channel): 


Transmitted  ,  Received 

1-a 


Figure  5.2:  Conditional  probabilities  modeling  a  binary  channel. 


Figure  5.2  depicts  the  conditional  probabilities  for  a  noisy  binary  channel.  On  the  left  side  are 
the  two  possible  values  of  the  bit  sent,  and  on  the  right  are  the  two  possible  values  of  the  bit 
received.  The  labels  on  a  given  arrow  are  the  conditional  probability  of  the  received  bit,  given 
the  transmitted  bit.  Thus,  the  binary  channel  is  defined  by  means  of  the  following  conditional 
probabilities: 


F[0  received  1 0  transmitted]  =  1  —  a,  P[1  received  |0  transmitted]  =  a; 

P[0  received  1 1  transmitted]  =  &,  P[1  received  |1  transmitted]  =  l  —  h 

These  conditional  probabilities  are  often  termed  the  channel  transition  probabillities.  The  proba¬ 
bilities  a  and  b  are  called  the  crossover  probabilities.  When  a  =  b,we  obtain  the  binary  symmetric 
channel. 

Law  of  total  probability:  For  events  A  and  P,  we  have 

P[A]  =  P  [Ar\  B]  +  P  [AA  B^]  =  P[A\B]P[B]  +  P[A\B^]P[B^]  (5.7) 

In  the  above,  we  have  decomposed  an  event  of  interest.  A,  into  a  disjoint  union  of  two  events, 
AAB  and  so  that  (5.1)  applies.  The  sets  B  and  form  a  partition  of  the  entire  sample 

space;  that  is,  they  are  disjoint,  and  their  union  equals  12.  This  generalizes  to  any  partition  of 
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the  sample  space;  that  is,  if  i?i,  i?2,  •••  are  mutually  exclusive  events  such  that  their  union  covers 
the  sample  space  (actually,  it  is  enough  if  the  union  contains  A),  then 


P[A]  =  J2P[AnBi\=Y^  P[A\Bi\P[Bi] 

i  i 


(5.8) 


Example  5.1.2  (Applying  the  law  of  total  probability  to  the  binary  channel):  For  the 

channel  in  Figure  5.2,  set  a  =  0.1  and  h  =  0.25,  and  suppose  that  the  probability  of  transmitting 
0  is  0.6.  This  is  called  the  prior,  or  a  priori,  probability  of  transmitting  0,  because  it  is  the 
statistical  information  that  the  receiver  has  before  it  sees  the  received  bit.  Using  (5.3),  the  prior 
probability  of  1  being  transmitted  is 

P[0  transmitted]  =  0.6  =  1  —  P[1  transmitted] 

(since  sending  0  or  1  are  onr  only  options  for  this  particular  channel  model,  the  two  events  are 
complements  of  each  other).  We  can  now  compute  the  probability  that  0  is  received  using  the 
law  of  total  probability,  as  follows. 

P[0  received] 

=  P[0  received  1 0  transmitted]  P[0  transmitted]  +  P[0  received  |1  transmitted]  P[1  transmitted] 
=  0.9  X  0.6  +  0.25  X  0.4  =  0.64 


We  can  also  compnte  the  probability  that  1  is  received  using  the  same  techniqne,  bnt  it  is  easier 
to  infer  this  from  (5.3)  as  follows: 

P[1  received]  =  1  —  P[0  received]  =  0.36 


Bayes’  rule:  Given  P[A\B],  we  compute  P[P|A]  as  follows: 


PIRMl  =  P\A\B]P\B]  ^  PIA\B]P[B] 

‘  ‘  P[A\  P[A\B]P[B]  + P[A\B‘\P[B‘\ 


(5.9) 


where  we  have  used  (5.7).  Similarly,  in  the  setting  of  (5.8),  we  can  compnte  P[Bj\A\  as  follows: 


pio  I  ,1  P\A\Bi]P[Bi]  P\A\B,]P[B,] 

'  ‘  P\A]  Y,iPWBmBi 


(5.10) 


Bayes’  rnle  is  typically  nsed  as  follows  in  digital  commnnication.  The  event  B  might  correspond 
to  which  transmitted  signal  was  sent.  The  event  A  may  describe  the  received  signal,  so  that 
P[A|P]  can  be  compnted  based  on  our  model  for  the  statistics  of  the  received  signal,  given 
the  transmitted  signal.  Bayes’  rule  can  then  be  used  to  compute  the  conditional  probability 
P[P|A]  of  a  given  signal  having  been  transmitted,  given  information  about  the  received  signal, 
as  illustrated  in  the  example  below. 


Example  5.1.3  (Applying  Bayes’  rule  to  the  binary  channel):  Continuing  with  the  binary 
channel  of  Figure  5.2  with  a  =  0.1,  b  =  0.25,  let  us  hnd  the  probability  that  0  was  transmitted, 
given  that  0  is  received.  This  is  called  the  posterior,  or  a  posteriori,  probability  of  0  being 
transmitted,  because  it  is  the  statistical  model  that  the  receiver  infers  after  it  sees  the  received 
bit.  As  in  Example  5.1.2,  we  assume  that  the  prior  probability  of  0  being  transmitted  is  0.6.  We 
now  apply  Bayes’  rule  as  follows: 

U[0  transmitted  1 0  received]  =  — ^ — p[o  received] - - 

_  0.9x0. 6  _ 

0.64  32 
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where  we  have  used  the  computation  from  Example  5.1.2,  based  on  the  law  of  total  probability, 
for  the  denominator.  We  can  also  compute  the  posterior  probability  of  the  complementary  event 
as  follows: 


P[1  transmitted  1 0  received]  =  1  —  P[0  transmitted  |0  received]  =  — 

These  results  make  sense.  Since  the  binary  channel  in  Figure  5.2  has  a  small  probability  of  error, 
it  is  much  more  likely  that  0  was  transmitted  than  that  1  was  transmitted  when  we  receive  0.  The 
situation  would  be  reversed  if  1  were  received.  The  computation  of  the  corresponding  posterior 
probabilities  is  left  as  an  exercise.  Note  that,  for  this  example,  the  numerical  values  for  the 
posterior  probabilities  may  be  different  when  we  condition  on  1  being  received,  since  the  channel 
transition  probabilities  and  prior  probabilities  are  not  symmetric  with  respect  to  exchanging  the 
roles  of  0  and  1. 


Two  other  concepts  that  we  use  routinely  are  independence  and  conditional  independence. 
Independence:  Events  Ai  and  A2  are  independent  if 

P[A,nA2]  =  P[A,]P[A2]  (5.11) 


Example  5.1.4  (independent  bits):  Suppose  we  transmit  three  bits.  Each  time,  the  proba¬ 
bility  of  sending  0  is  0.6.  Assuming  that  the  bits  to  be  sent  are  selected  independently  each  of 
these  three  times,  we  can  compute  the  probability  of  sending  any  given  three-bit  sequence  using 

(5.11). 

P[000  transmitted]  =  P[hrst  bit  =  0,  second  bit  =  0,  third  bit  =  0] 

=  P[£rst  bit  =  0]P[second  bit  =  0]P[third  bit  =  0]  =  0.6^  =  0.216 

Let  us  do  a  few  other  computations  similarly,  where  we  now  use  the  shorthand  P[xia;2a;3]  to 
denote  that  XiX2X^  is  the  sequence  of  three  bits  transmitted. 

P[101]  =  0.4  X  0.6  X  0.4  =  0.096 


and 

P[two  ones  transmitted]  =  P[110]  -|-  P[101]  -|-  P[011]  =  3  x  (0.4)^  x  0.6  =  0.288 
The  number  of  ones  is  actually  a  binomial  random  variable  (reviewed  in  Section  5.2). 


Conditional  Independence:  Events  Ai  and  A2  are  conditionally  independent  given  B  if 


P[A,AA2\B]  =  P[A,\B]P[A2\B] 


(5.12) 


Example  5.1.5  (independent  channel  nses):  Now,  suppose  that  we  transmit  three  bits, 
with  each  bit  seeing  the  binary  channel  depicted  in  Figure  5.2.  We  say  that  the  channel  is  mem¬ 
oryless  when  the  value  of  the  received  bit  corresponding  to  a  given  channel  use  is  conditionally 
independent  of  the  other  received  bits,  given  the  transmitted  bits.  For  the  setting  of  Example 
5.1.4,  where  we  choose  the  transmitted  bits  independently,  the  following  example  illustrates  the 
computation  of  conditional  probabilities  for  the  received  bits. 

P[100  received  1 010  transmitted] 

=  P[1  received  1 0  transmitted]  P[0  received  |1  transmitted]  P[0  received  |0  transmitted] 

=  0.1  X  0.25  X  0.9  =  0.0225 
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We  end  this  section  with  a  mention  of  two  useful  bounding  techniques. 

Union  bonnd:  The  probability  of  a  union  of  events  is  upper  bounded  by  the  sum  of  the 
probabilities  of  the  events. 

P[Ai  U  As]  <  P[Ai]  +  P[A2]  (5.13) 

This  follows  from  (5.4)  by  noting  that  P[Ai  fl  As]  >  0.  This  property  generalizes  to  a  union  of 
a  collection  of  events  by  mathematical  induction; 


P 


<  E  ^1-4.1 

i=l 


(5.14) 


If  A  implies  P,  then  P[A]  <  P[B\:  An  event  A  implies  an  event  B  (denoted  by  A  — >  B)  if 
and  only  if  A  is  contained  in  B  (i.e.,  A  C  B).  In  this  case,  we  can  write  P  as  a  disjoint  union  as 
follows:  P  =  A  U  (P  \  A).  This  means  that  P[B]  =  P[A]  +  P[B  \  A]  >  P[A],  since  P[P  \  A]  >0. 


5.2  Random  Variables 


Figure  5.3:  A  random  variable  is  a  mapping  from  the  sample  space  to  the  real  line. 


A  random  variable  assigns  a  number  to  each  outcome  of  a  random  experiment.  That  is,  a 
random  variable  is  a  mapping  from  the  sample  space  hi  to  the  set  of  real  numbers,  as  shown  in 
Figure  5.3.  The  underlying  experiment  that  leads  to  the  outcomes  in  the  sample  space  can  be 
quite  complicated  (e.g.,  generation  of  a  noise  sample  in  a  communication  system  may  involve 
the  random  movement  of  a  large  number  of  charge  carriers,  as  well  as  the  filtering  operation 
performed  by  the  receiver).  However,  we  do  not  need  to  account  for  these  underlying  physical 
phenomena  in  order  to  specify  the  probabilistic  description  of  the  random  variable.  All  we  need 
to  do  is  to  describe  how  to  compute  the  probabilities  of  the  random  variable  taking  on  a  particular 
set  of  values.  In  other  words,  we  need  to  specify  its  probability  distribution,  or  probability  law. 
Consider,  for  example,  the  Bernoulli  random  variable,  which  may  be  used  to  model  random  bits 
sent  by  a  transmitter,  or  to  indicate  errors  in  these  bits  at  the  receiver. 

Bernoulli  random  variable:  X  is  a  Bernoulli  random  variable  if  it  takes  values  0  or  1.  The 
probability  distribution  is  specihed  if  we  know  P[X  =  0]  and  P[X  =  1].  Since  X  can  take  only 
one  of  these  two  values,  the  events  {X  =  0}  and  {X  =  1}  constitute  a  partition  of  the  sample 
space,  so  that  P[X  =  0]  +P[X  =  1]  =  1.  We  therefore  can  characterize  the  Bernoulli  distribution 
by  a  parameter  pG  [0, 1],  where  p  =  P[X  =  1]  =  1  —  P[X  =  0].  We  denote  this  distribution  as 
Bernoulli  (p). 

In  general,  if  a  random  variable  takes  only  a  discrete  set  of  values,  then  its  distribution  can  be 
specihed  simply  by  specifying  the  probabilities  that  it  takes  each  of  these  values. 

Discrete  Random  Variable,  Probability  Mass  Function:  X  is  a  discrete  random  variable 
if  it  takes  a  hnite,  or  countably  inhnite,  number  of  values.  If  X  takes  values  Xi,X2, ...,  then  its 
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probability  distribution  is  characterized  by  its  probability  mass  function  (PMF),  or  the  probabil¬ 
ities  Pi  =  P[X  =  Xj],  i  =  1,2, ....  These  probabilities  must  add  up  to  one,  'YliVi  =  1)  since  the 
events  {X  =  Xj},  i  =  1,  2, ...  provide  a  partition  of  the  sample  space. 

For  random  variables  that  take  values  in  a  continuum,  the  probability  of  taking  any  particular 
value  is  zero.  Rather,  we  seek  to  specify  the  probability  that  the  value  taken  by  the  random 
variable  falls  in  a  given  set  of  interest.  By  choosing  these  sets  to  be  intervals  whose  size  shrinks 
to  zero,  we  arrive  at  the  notion  of  probability  density  function,  as  follows. 

Continuous  Random  Variable,  Probability  Density  Function:  X  is  a  continuous  random 
variable  if  the  probability  P[X  =  x\  is  zero  for  each  x.  In  this  case,  we  dehne  the  probability 
density  function  (PDF)  as  follows: 


Pxix) 


,  P[x  <  X  <  X  +  Ax] 

hm  - - - 

Ax 


In  other  words,  for  small  intervals,  we  have  the  approximate  relationship: 


(5.15) 


P[x  <  X  <  X  +  Ax]  px{x)  Ax 

Expressing  an  event  of  interest  as  a  disjoint  union  of  such  small  intervals,  the  probability  of  the 
event  is  the  sum  of  the  probabilities  of  these  intervals;  as  we  let  the  length  of  the  intervals  shrink, 
the  sum  becomes  an  integral  (with  Ax  replaced  by  dx).  Thus,  the  probability  of  X  taking  values 
in  a  set  A  can  be  computed  by  integrating  its  PDF  over  A,  as  follows: 


P[X  e  A] 


Px{x)dx 


(5.16) 


The  PDF  must  integrate  to  one  over  the  real  line,  since  any  value  taken  by  X  falls  within  this 
interval: 


px{x)dx 


1 


Notation:  We  use  the  notation  px{x)  to  denote  the  density  of  a  random  variable  X,  evaluated 
at  the  point  x.  Thus,  the  argument  of  the  density  is  a  dummy  variable,  and  could  be  denoted 
by  some  other  letter:  for  example,  we  could  use  the  notation  pxiu)  as  notation  for  the  density 
of  X,  evaluated  at  the  point  u.  Once  we  hrmly  establish  these  concepts,  however,  we  plan  to 
allow  ourselves  to  get  sloppy.  As  discussed  in  the  note  at  the  end  of  Section  5.3,  if  there  is  no 
scope  for  confusion,  we  plan  to  use  the  dummy  variable  to  also  denote  the  random  variable  we 
are  talking  about.  For  example,  we  use  p(x)  as  the  notation  for  px{x)  and  p{y)  as  the  notation 
for  pviy)-  But  for  now,  we  retain  the  subscripts  in  the  introductory  material  in  Sections  5.2  and 
5.3. 


Density:  We  use  the  generic  term  “density”  to  refer  to  both  PDF  and  PMF  (but  more  often 
the  PDF),  relying  on  the  context  to  clarify  what  we  mean  by  the  term. 

The  PMF  or  PDF  cannot  be  used  to  describe  mixed  random  variables  that  are  neither  discrete 
nor  continuous.  We  can  get  around  this  problem  by  allowing  PDFs  to  contain  impulses,  but  a 
general  description  of  the  probability  distribution  of  any  random  variable,  whether  it  is  discrete, 
continuous  or  mixed,  can  be  provided  in  terms  of  its  cumulative  distribution  function,  dehned 
below. 


Cumulative  distribution  function  (CDF):  The  CDF  of  a  random  variable  X  is  dehned  as 


F^{x)  =  P[X  <  x] 


and  has  the  following  general  properties: 
(1)  Fx{x)  is  nondecreasing  in  x. 


192 


This  is  because,  for  xi  <  X2,  we  have  {X  <  xi}  C  {X  <  X2},  so  that  P[X  <  xi]  <  P[X  <  X2]. 

(2)  Fx{—oo)  =  0  and  Fx{oo)  =  1. 

The  event  {X  <  — cx)}  contains  no  allowable  values  for  X,  and  is  therefore  the  empty  set,  which 
has  probabilty  zero.  The  event  {X  <  cx}  contains  all  allowable  values  for  X,  and  is  therefore 
the  entire  sample  space,  which  has  probabilty  one. 

(3)  Fx{x)  is  right-continuous:  Fx{x)  =  hm5^o,<5>o  Fx{x+5).  Denoting  this  right  limit  as  Fx{x'^), 
and  can  state  the  property  compactly  as  Fx{x)  =  Fx{x~^). 

The  proof  is  omitted,  since  it  requires  going  into  probability  theory  at  a  depth  that  is  unnecessary 
for  our  purpose. 

Any  function  that  satishes  (l)-(3)  is  a  valid  CDF.  The  CDFs  for  discrete  and  mixed  random 
variables  exhibit  jumps.  At  each  of  these  jumps,  the  left  limit  F{x~)  is  strictly  smaller  than  the 
right  limit  Fx{x~^)  =  Fx{x).  Noting  that 

P[X  =  x]  =  P[X  <x]-  P[X  <x]=  Fx{x)  -  Fx{x-)  (5.17) 

we  note  that  the  jumps  correspond  to  the  discrete  set  of  points  where  nonzero  probability  mass 
is  assigned.  For  a  discrete  random  variable,  the  CDF  remains  constant  between  these  jumps. 
The  PMF  is  given  by  applying  (5.17)  for  x  =  Xi,  i  =  1,  2, ...,  where  {xj}  is  the  set  of  values  taken 
by  X. 

For  a  continuous  random  variable,  there  are  no  jumps  in  the  CDF,  since  P[X  =  x]  =  0  for  all  x. 
That  is,  a  continuous  random  variable  can  be  defined  as  one  whose  CDF  is  a  continuous  function. 
From  the  definition  (5.15)  of  PDF,  it  is  clear  that  the  PDF  of  a  continuous  random  variable  is 
the  derivative  of  the  CDF;  that  is, 

Px{x)  =  F'xix)  (5.18) 

Actually,  it  is  possible  that  the  derivative  of  the  CDF  for  a  continuous  random  variable  does  not 
exist  at  certain  points  (i.e.,  when  the  slopes  of  Fx{x)  approaching  from  the  left  and  the  right  are 
different).  The  PDF  at  these  points  can  be  defined  as  either  the  left  or  the  right  slope;  it  does  not 
make  a  difference  in  our  probability  computations,  which  involving  integrating  the  PDF  (which 
washes  away  the  effect  of  individual  points).  We  therefore  do  not  worry  about  this  technicality 
any  further. 

We  obtain  the  CDF  from  the  PDF  by  integrating  the  relationship  (5.18): 

Fxix)  =  j  px{z)  dz  (5.19) 

J  —00 


It  is  also  useful  to  dehne  the  complementary  CDF. 

Complementary  cumulative  distribution  function  (CCDF):  The  CCDF  of  a  random 
variable  X  is  dehned  as 

F^(x)  =  P[X>x]  =  l-Fx(x) 

The  CCDF  is  often  useful  in  talking  about  tail  probabilities  (e.g.,  the  probability  that  a  noise 
sample  takes  a  large  value,  causing  an  error  at  the  receiver).  For  a  continuous  random  variable 
with  PDF  px{,x),  the  CCDF  is  given  by 

poo 

Fx{x)  =  /  px{z)  dz  (5.20) 

J  X 


We  now  list  a  few  more  commonly  encountered  random  variables. 

Exponential  random  variable:  The  random  variable  X  has  an  exponential  distribution  with 
parameter  A,  which  we  denote  as  X  ~  Exp{X),  if  its  PDF  is  given  by 


Pxix) 


Xe  X  >  0 

0,  X  <  0 
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Figure  5.4:  PDF  of  an  exponential  random  variable  with  parameter  A  =  1/5  (or  mean  ^  =  5). 

See  Figure  5.4  for  an  example  PDF.  We  can  write  this  more  compactly  using  the  indicator 
function: 

Px{x)  =  Ae"^^/[o,oo)(a;) 

The  CDF  is  given  by 

Fx{x)  =  (1  -  e"^"')/[o,oo)(a;) 

For  X  >  0,  the  CCDF  is  given  by 

F^{x)  =P[X>x\  = 

That  is,  the  tail  of  an  exponential  distribution  decays  (as  behts  its  name)  exponentially. 


Figure  5.5:  PDF  of  a  Gaussian  random  variable  with  parameters  m  =  5  and  =  16.  Note  the 
bell  shape  for  the  Gaussian  density,  with  peak  around  its  mean  m  =  5 
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Gaussian  (or  normal)  random  variable:  The  random  variable  X  has  a  Gaussian  distribution 
with  parameters  m  and  if  its  PDF  is  given  by 

See  Figure  5.5  for  an  example  PDF.  As  we  show  in  Section  5.5,  m  is  the  mean  of  X  and  is 
its  variance.  The  PDF  of  a  Gaussian  has  a  well-known  bell  shape,  as  shown  in  Figure  5.5.  The 
Gaussian  random  variable  plays  a  very  important  role  in  communication  system  design,  hence 
we  discuss  it  in  far  more  detail  in  Section  5.6,  as  a  prerequisite  for  the  receiver  design  principles 
to  be  developed  in  Ghapter  6. 


Example  5.2.1 
PDF 


(Recognizing  a  Gaussian  density) :  Suppose  that  a  random  variable  X  has 


px{x)  = 


where  c  is  an  unknown  constant,  and  x  ranges  over  the  real  line.  Specify  the  distribution  of  X 
and  write  down  its  PDF. 

Solution:  Any  PDF  with  an  exponential  dependence  on  a  quadratic  can  be  put  in  the  form  (5.21) 
by  completing  squares  in  the  exponent. 


— +  X  =  — 2(x^  —  x/2) 


-2 


1 

16 


Gomparing  with  (5.21),  we  see  that  the  PDF  can  be  written  as  an  A^(m, n^)  PDF  with  m  =  | 
and  ^  =  2,  so  that  Thus,  X  ~  i)  and  its  PDF  is  given  by  specializing  (5.21): 

Px{x)  =  i _ 

VW4 

We  usually  do  not  really  care  about  going  back  and  specifying  the  constant  c,  since  we  already 
know  the  form  of  the  density.  But  it  is  easy  to  check  that  c  =  -\/2/7re“8 . 


Binomial  random  variable:  We  say  that  a  random  variable  Y  has  a  binomial  distribution 
with  parameters  n  and  p,  and  denote  this  by  D  ~  Bin{n,p),  if  Y  takes  integer  values  0, 1,  ...,n, 
with  probability  mass  function 

p,  ^  P[Y  =  k]  =  h'j  pHi  -  ,  fc  =  0. 1. n 

Recall  that  ”n  choose  k"  (the  number  of  ways  in  which  we  can  choose  k  items  out  of  n  identical 
items,  is  given  by  the  expression 

f  n  \  _  n\ 

V  y  k\{n-k)\ 

with  /c!  =  lx2x...x/c  denoting  the  factorial  operation.  The  binomial  distribution  can  be 
thought  of  a  discrete  time  analogue  of  the  Gaussian  distribution;  as  seen  in  Figure  5.6,  the  PMF 
has  a  bell  shape.  We  comment  in  more  detail  on  this  when  we  discuss  the  central  limit  theorem 
in  Appendix  5.B. 

Poisson  random  variable:  X  is  a  Poisson  random  variable  with  parameter  A  >  0  if  it  takes 
values  from  the  nonnegative  integers,  with  pmf  given  by 

P[X  =  k]  =  ^e-\  A:  =  0,1,  2,... 

As  shown  later,  the  parameter  A  equals  the  mean  of  the  Poisson  random  variable. 
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Figure  5.6:  PMF  of  a  binomial  random  variable  with  n  =  20  and  p  =  0.3. 

5.3  Multiple  Random  Variables,  or  Random  Vectors 


Figure  5.7:  Multiple  random  variables  defined  on  a  common  probability  space. 


We  are  often  interested  in  more  than  one  random  variable  when  modeling  a  particular  scenario 
of  interest.  For  example,  a  model  of  a  received  sample  in  a  communication  link  may  involve  a 
randomly  chosen  transmitted  bit,  a  random  channel  gain,  and  a  random  noise  sample.  In  general, 
we  are  interested  in  multiple  random  variables  defined  on  a  “common  probability  space,”  where 
the  latter  phrase  means  simply  that  we  can,  in  principle,  compute  the  probability  of  events 
involving  all  of  these  random  variables.  Technically,  multiple  random  variables  on  a  common 
probability  space  are  simply  different  mappings  from  the  sample  space  hi  to  the  real  line,  as 
depicted  in  Figure  5.7.  However,  in  practice,  we  do  not  usually  worry  about  the  underlying 
sample  space  (which  can  be  very  complicated),  and  simply  specify  the  joint  distribution  of  these 
random  variables,  which  provides  information  sufficient  to  compute  the  probabilities  of  events 
involving  these  random  variables. 

In  the  following,  suppose  that  Xi,  ...,X„  are  random  variables  defined  on  a  common  probability 
space;  we  can  also  represent  them  as  an  n-dimensional  random  vector's.  =  (Xi,  ...,X„)^. 

Joint  Cumulative  Distribution  Function:  The  joint  CDF  is  defined  as 

Fx(x)  =  Fx^,...,Xn{Xi,  ...,Xn)  =  P[Xi  <  Xi,  ...,Xn  <  Xn] 
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Joint  Probability  Density  Function:  When  the  joint  CDF  is  continuous,  we  can  dehne  the 
joint  PDF  as  follows: 

,  ,  ,  ,  d  d  ^  ,  . 

Px(x)  =  PXi,...,xA^l,  ■■■,Xn)  =  -^...-^Fxi,...,xAXi,  Xn) 

We  can  recover  the  joint  CDF  from  the  joint  PDF  by  integrating: 

/Xi  pXji 

•••  /  PXi,...,X„{Uu...,Un)dUi...dUn 

■OO  j  — OD 

The  joint  PDF  must  be  nonnegative  and  must  integrate  to  one  over  n-dimensional  space.  The 
probability  of  a  particular  subset  of  n-dimensional  space  is  obtained  by  integrating  the  joint  PDF 
over  the  subset. 

Joint  Probability  Mass  Function  (PMF):  For  discrete  random  variables,  the  joint  PMF  is 
dehned  as 

Px(x)  =  PXu-,xAxi,  ■■■,Xn)  =  P[Xi  =  Xi,  ...,  =  Xn] 

Marginal  distributions:  The  marginal  distribution  for  a  given  random  variable  (or  set  of 
random  variables)  can  be  obtained  by  integrating  or  summing  over  all  possible  values  of  the 
random  variables  that  we  are  not  interested  in.  For  CDFs,  this  simply  corresponds  to  setting 
the  appropriate  arguments  in  the  joint  CDF  to  inhnity.  For  example, 

Fx{x)  =  P[X  <  x]  =  P[X  <  x,Y  <  oo]  =  Fx^y{x,  oo) 

For  continuous  random  variables,  the  marginal  PDF  is  obtained  from  the  joint  PDF  by  “inte¬ 
grating  out”  the  undesired  random  variable: 

/CX) 

Pxxix,  y)dy  ,  -  oo  <  X  <  oo 

■CXD 

For  discrete  random  variables,  we  sum  over  the  possible  values  of  the  undesired  random  variable: 

Px{.x)  =  ^  Px,Y{.x,y)  ,  xeX 

y^y 

where  X  and  y  denote  the  set  of  possible  values  taken  by  X  and  Y,  respectively. 


Example  5.3.1  (Joint  and  marginal  densities):  Random  variables  X  and  Y  have  joint 
density  given  by 

{c  xy,  0  <  X,  2/  <  1 
2c  xy,  -l<x,y<0 
0,  else 

where  the  constant  c  is  not  specihed. 

(a)  Find  the  value  of  c. 

(b)  Find  P[X  +  Y  <  1]. 

(c)  Specify  the  marginal  distribution  of  X. 

Solution: 

(a)  We  hnd  the  constant  using  the  observation  that  the  joint  density  must  integrate  to  one: 

1  =  f  f  Px,Y(x,y)  dx  dy 

~  ^  lo  lo  ^y  I-i  I-i  ^y  ^y 


1  2 
+  2c  ^ 

0 


-1 


-1 


3c/4 
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Thus,  c  =  4/3. 

(b)  The  required  probability  is  obtained  by  integrating  the  joint  density  over  the  shaded  area  in 
Figure  5.8.  We  obtain 


P[X  +  Y  <1]  =  cxydxdy  +  2cdxdy 

=  c/ln  ydy  +  2c^ 


=  c  y^^^dy  +  2c/4  =  c/24  +  c/2  =  13c/24 
=  13/18 


We  could  have  computed  this  probability  more  quickly  in  this  example  by  integrating  the  joint 
density  over  the  unshaded  area  to  hnd  P[X  +  Y  >  V\,  since  this  area  has  a  simpler  shape: 


P[X  +  Y>1\  =  cxydxdy  =  c 

=  (c/2)  f^l,y(2y  -  y^)dy  =  5c/24  =  5/18 


ydy 


from  which  we  get  that  P[X  +  Y  <  1]  =  1  —  P[X  +  F  >  1]  =  13/18. 

(c)  The  marginal  density  of  X  is  found  by  integrating  the  joint  density  over  all  possible  values 
of  Y .  For  0  <  x  <  1,  we  obtain 


Px(x)  =  c  xy  dy  =  c  X- 


y=0 


=  c  xl2  =  2x/?> 


(5.22) 


For  —  1  <  X  <  0,  we  have 


Px(x)  =  /  2c  xy  dy  =  2c  X 


y 


y=0 


=  —c  X  =  —4:x/3 


Conditional  density:  The  conditional  density  of  Y  given  X  is  dehned  as 

pxxi^^y) 


PY\x(y\x)  = 


Px(x) 


(5.23) 


(5.24) 


where  the  dehnition  applies  for  both  PDFs  and  PMFs,  and  where  we  are  interested  in  values  of 
X  such  that  px(x)  >  0.  For  jointly  continuous  X  and  Y ,  the  conditional  density  p(y\x)  has  the 
interpretation 

PY\x(y\x)Xy  ^  P\Y  e[y,y  +  Ay]\X  e[x,x  +  Ax]]  (5.25) 
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for  Ax,  Ay  small.  For  discrete  random  variables,  the  conditional  pmf  is  simply  the  following 
conditional  probability: 

PY\x{y\x)  =  P[Y  =  y\X  =  x]  (5.26) 


Example  5.3.2  Continuing  with  Example  5.3.1,  let  us  hnd  the  conditional  density  of  Y  given 
X .  For  X  =  a;  G  [0, 1],  we  have  Px,Y{.x,y)  =  c  xy,  with  0  <  |/  <  1  (the  joint  density  is  zero  for 
other  values  of  y,  under  this  conditioning  on  X).  Applying  (5.24),  and  substituting  (5.22),  we 
obtain 

f  I  ^  Px,Y{x,y)  cxy 

PY\x{y\x)  =  ^  =2y  ,  0  <  2/  <  1  (for  0  <  X  <  1) 

Similarly,  for  X  =  x  G  [—1,  0],  we  obtain,  using  (5.23),  that 

.  I  ^  Pxxi^^y)  2cx2/  1  /  /  n 

PY\x{y\x)  = - =  - =  -2y  ,  -  1  <  1/  <  0  (for  -  1  <  X  <  0) 

Px\x)  —cx 

We  can  now  compute  conditional  probabilities  using  the  preceding  conditional  densities.  For 
example, 

/•- 0-5  -0.5 

P[Y  <  -0.5|X  =  -0.5]  =  J  i-2y)dy  = -y^  ^  =  3/4 
whereas  P[Y  <  0.5|X  =  -0.5]  =  1  (why?). 


Bayes’  rule  for  conditional  densities:  Given  the  conditional  density  of  Y  given  X,  the 
conditional  density  for  X  given  Y  is  given  by 


Px\Y{,x\y) 

Px\Y{x\y) 


PY\x(y\x)px{x) 

PY{y) 

PY\x(y\x)px{.x) 

PY{y) 


PY\x(y\x:)px[x) 
IPY\x(y\Apx{Adx  ’ 
PY\xiy\Apx(A 

'11^Py\x{v\x)Px{x)  ’ 


continnous  random  variables 
discrete  random  variables 


We  can  also  mix  discrete  and  continuous  random  variables  in  applying  Bayes’  rule,  as  illustrated 
in  the  following  example. 


Example  5.3.3  (Conditional  probability  and  Bayes’  rnle  with  discrete  and  continnous 
random  variables)  A  bit  sent  by  a  transmitter  is  modeled  as  a  random  variable  X  taking  values 
0  and  1  with  equal  probability.  The  corresponding  observation  at  the  receiver  is  modeled  by  a 
real- valued  random  variable  Y .  The  conditional  distribution  of  Y  given  X  =  0  is  X(0,4).  The 
conditional  distribution  of  Y  given  X  =  1  is  X(10,4).  This  might  happen,  for  example,  with 
on-off  signaling,  where  we  send  a  signal  to  send  1,  and  send  nothing  when  we  want  to  send  0. 
The  receiver  therefore  sees  signal  plus  noise  if  1  is  sent,  and  sees  only  noise  if  0  is  sent,  and  the 
observation  Y ,  presumably  obtained  by  processing  the  received  signal,  has  zero  mean  if  0  is  sent, 
and  nonzero  mean  if  1  is  sent. 

(a)  Write  down  the  conditional  densities  of  Y  given  X  =  0  and  X  =  1,  respectively. 

(b)  Find  P[Y  =  7|X  =  0],  P[Y  =  7|X  =  1]  and  P[Y  =  7]. 

(c)  Find  P[Y  >  7|X  =  0]. 

(d)  Find  P\Y  >  7|X  =  1]. 

(e)  Find  P[X  =  0|F  =  7]. 

Solution  to  (a):  We  simply  plug  in  numbers  into  the  expression  (5.21)  for  the  Gaussian  density 
to  obtain: 


p{y\x  =  0) 


p{y\x  =  l)dy 
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Solution  to  (b):  Conditioned  on  X  =  0,  y  is  a  continuous  random  variable,  so  the  probability 
of  taking  a  particular  value  is  zero.  Thus,  P\Y  =  7\X  =  0]  =  0.  By  the  same  reasoning, 
P[Y  =  7\X  =  1]  =  0.  The  unconditional  probability  is  given  by  the  law  of  total  probability: 

P\Y  =  7]  =  P\Y  =  7\X  =  0]P[X  =  0]  +  P[Y  =  7\X  =  1]P[X  =  1]  =  0 


Solution  to  (c):  Finding  the  probability  of  Y  lying  in  a  region,  conditioned  on  X  =  0,  simply 
involves  integrating  the  conditional  density  over  that  region.  We  therefore  have 


P[Y  >  7\X  =  0] 


/^dy 


We  shall  see  in  Section  5.6  how  to  express  such  probabilities  involving  Gaussian  densities  in  com¬ 
pact  form  using  standard  functions  (which  can  be  evaluated  using  built-in  functions  in  Matlab), 
but  for  now,  we  leave  the  desired  probability  in  terms  of  the  integral  given  above. 

Solution  to  (d):  This  is  analogous  to  (c),  except  that  we  integrate  the  conditional  probability  of 
Y  given  X  =  1\ 


poo  poo  -I 

P[Y  >  7|X  =  1]  =  /  p{y\x  =  l)dy  =  / 

Jj  Ji  vStt 

Solution  to  (e):  Now  we  want  to  apply  Bayes’  rule  for  find  P[X  =  0\Y  =  7],  But  we  know  from 
(b)  that  the  event  {Y  =  7}  has  zero  probability.  How  do  we  condition  on  an  event  that  never 
happens?  The  answer  is  that  we  define  P[X  =  0|y  =  7]  to  be  the  limit  of  P[X  =  0|y  G  (7  — 
e,  7  -|-  e)]  as  e  — )■  0.  For  any  e  >  0,  the  event  that  we  are  conditioning  on,  {Y  E  {7  —  e,7  +  e)},  and 
we  can  show  by  methods  beyond  our  present  scope  that  one  does  get  a  well-defined  limit  as  e 
tends  to  zero.  However,  we  do  not  need  to  worry  about  such  technicalities  when  computing  this 
conditional  probability:  we  can  simply  compute  it  (for  an  arbitrary  value  of  H  =  y)  as 


P[X  =  0\Y  =  y]  = 


PY\x{y\o)P[x  =  0] 
Pviy) 


PY\x{y\o)P[x  =  0] 


PY\x{y\o)P[x  =  0]  +  pY\x{y\^)P[x  =  i] 


Substituting  the  conditional  densities  from  (a)  and  setting  P[X  =  0]  =  P[X  =  1]  =  1/2,  we 
obtain 

ip-y’^/8  1 

P[X  =  0|y  = !,]  =  T—  ^ 


Plugging  in  y  =  7,  we  obtain 
which  of  course  implies  that 


Iq-V^/S  lg-(?/-10)2/8  I  g5(y-5)/2 


P[x  =  0|y  =  7]  =  0.0067 


P[x  =  l|y  =  7]  =  1  -  P[X  =  0|y  =  7]  =  0.9933 


Before  seeing  Y,  we  knew  only  that  0  or  1  were  sent  with  equal  probability.  After  seeing  Y  =  7, 
however,  our  model  tells  us  that  1  was  far  more  likely  to  have  been  sent.  This  is  of  course  what  we 
want  in  a  reliable  communication  system:  we  begin  by  not  knowing  the  transmitted  information 
at  the  receiver  (otherwise  there  would  be  no  point  in  sending  it),  but  after  seeing  the  received 
signal,  we  can  infer  it  with  high  probability.  We  shall  see  many  more  such  computations  in  the 
next  chapter:  conditional  distributions  and  probabilities  are  fundamental  to  principled  receiver 
design. 
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Independent  Random  Variables:  Random  variables  Xi,  are  independent  if 

P[x,  e  Ai, e  =  P[x,  e  e 

for  any  snbsets  Ai, A^.  That  is,  events  defined  in  terms  of  valnes  taken  by  these  random  vari¬ 
ables  are  independent  of  each  other.  This  implies,  for  example,  that  the  conditional  probability 
of  an  event  dehned  in  terms  of  one  of  these  random  variables,  conditioned  on  events  dehned  in 
terms  of  the  other  random  variables,  eqnals  the  nnconditional  probability: 

P[X,  e  Ai  1X2  e  kl2, ...,  e  A,]  =  P[Xi  e  Ai] 

In  terms  of  distribntions  and  densities,  independence  means  that  joint  distribntions  are  prodncts 
of  marginal  distribntions,  and  joint  densities  are  prodncts  of  marginal  densities. 

Joint  distribution  is  product  of  marginals  for  independent  random  variables:  If 
Xi,  ...,Xn  are  independent,  then  their  joint  CDF  is  a  prodnct  of  the  marginal  CDFs: 

=  FxAxi)--FxAXn) 

and  their  joint  density  (PDF  or  PMF)  is  a  prodnct  of  the  marginal  densities: 

PX^,...,xAxi,  ...,Xn)=  PXi{Xl)...pxAXn) 

Independent  and  identically  distributed  (i.i.d.)  random  variables:  We  are  often  inter¬ 
ested  in  collections  of  independent  random  variables  in  which  each  random  variable  has  the  same 
marginal  distribntion.  We  call  snch  random  variables  independent  and  identically  distribnted. 

Example  5.3.4  (A  sum  of  i.i.d.  Bernoulli  random  variables  is  a  Binomial  random  variable): 
Let  Xi,  ...,Xn  denote  i.i.d.  Bernonlli  random  variables  with  P[Xi  =  1]  =  1  — P[Xi  =  0]  =  p,  and 
let  Y  =  Xi-|-...-|-X„  denote  their  snm.  We  conld  think  of  X^  denoting  whether  the  fth  coin  flip  (of 
a  possibly  biased  coin,  if  p  7^  |)  yield  heads,  where  snccessive  flips  have  independent  ontcomes, 
so  that  Y  is  the  nnmber  of  heads  obtained  in  n  flips.  For  commnnications  applications,  Xi  conld 
denote  whether  the  fth  bit  in  a  seqnence  of  n  bits  is  incorrectly  received,  with  snccessive  bit 
errors  modeled  as  independent,  so  that  Y  is  the  total  nnmber  of  bit  errors.  The  random  variable 
Y  takes  discrete  valnes  in  {0, 1,  ...,n}.  Its  PMF  is  given  by 

P[Y  =  k\={j^^  p\l  -  pf-^  ,  A:  =  0, 1, ...,  n 

That  is,  Y  ~  Bin{n,p).  To  see  why,  note  that  Y  =  k  reqnires  that  exactly  k  of  the  {Xi}  take 
valne  1,  with  the  remaining  n  —  k  taking  valne  0.  Let  us  compute  the  probability  of  one  such 
outcome,  {Xi  =  1, ...,  Xk  =  1,  =  0, ...,  X„  =  0}: 

P[Xi  =  l,...,Xfc  =  l,Xfc+i  =  0,...,X„  =  0]  =  P[Xi  =  l]...P[Xk  =  l]P[Xfc+i  =  0]...P[X„  =  0] 

=  p^(^\  —  p^'n.-k 

Clearly,  any  other  outcome  with  exactly  k  ones  has  the  same  probability,  given  the  i.i.d.  nature 
of  the  {Xi}.  We  can  now  sum  over  the  probabilities  of  these  mutually  exclusive  events,  noting 
that  there  are  exactly  “n  choose  k”  such  outcomes  (the  number  of  ways  in  which  we  can  choose 
the  k  random  variables  {Xi}  which  take  the  value  one)  to  obtain  the  desired  PMF. 

Density  of  sum  of  independent  random  variables:  Suppose  that  Xi  and  X2  are  indepen¬ 
dent  continuous  random  variables,  and  let  Y  =  Xi  +  X2.  Then  the  PDF  of  R  is  a  convolution 
of  the  PDFs  of  Xi  and  X2: 

/CXD 

PxAxi)PX2{y  -  xi)  dxi 

■00 

For  discrete  random  variables,  the  same  result  holds,  except  that  the  PMF  is  given  by  a  discrete¬ 
time  convolution  of  the  PMFs  of  Xi  and  X2. 
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Figure  5.9:  The  sum  of  two  independent  uniform  random  variables  has  a  PDF  with  trapezoidal 
shape,  obtained  by  convolving  two  boxcar-shaped  PDFs. 


Example  5.3.5  (^Sum  of  two  uniform  random  variables)  Suppose  that  Xi  is  uniformly 
distributed  over  [0, 1],  and  X2  is  uniformly  distributed  over  [—1, 1].  Then  Y  =  Xi  +  X2  takes 
values  in  the  interval  [—1,2],  and  its  density  is  the  convolution  shown  in  Figure  5.9. 


Of  particular  interest  to  us  are  jointly  Gaussian  random  variables,  which  we  discuss  in  more 
detail  in  Section  5.6. 

Notational  simplification:  In  the  preceding  dehnitions,  we  have  distinguished  between  differ¬ 
ent  random  variables  by  using  subscripts.  For  example,  the  joint  density  of  X  and  Y  is  denoted 
by  px,Y{x,y),  where  X,  Y  denote  the  random  variables,  and  x,  y,  are  dummy  variables  that  we 
might,  for  example,  integrate  over  when  evaluating  a  probability.  We  could  easily  use  some  other 
notation  for  the  dummy  variables,  e.g.,  the  joint  density  could  be  denoted  as  Px,y{u,v).  After 
all,  we  know  that  we  are  talking  about  the  joint  density  of  X  and  Y  because  of  the  subscripts. 
However,  carrying  around  the  subscripts  is  cumbersome.  Therefore,  from  now  on,  when  there 
is  no  scope  for  confusion,  we  drop  the  subscripts  and  use  the  dummy  variables  to  also  denote 
the  random  variables  we  are  talking  about.  For  example,  we  now  use  p{x,  y)  as  shorthand  for 
Px,y{x,  y),  choosing  the  dummy  variables  to  be  lower  case  versions  of  the  random  variables  they 
are  associated  with.  Similarly,  we  use  p{x)  to  denote  the  density  of  X,  p{y)  to  denote  the  density 
of  y,  and  p{y\x)  to  denote  the  conditional  density  of  Y  given  X.  Of  course,  we  revert  to  the 
subscript-based  notation  whenever  there  is  any  possibility  of  confusion. 


5.4  Functions  of  random  variables 


Figure  5.10:  A  function  of  a  random  variable  is  also  a  random  variable. 


We  review  here  methods  of  determining  the  distribution  of  functions  of  random  variables.  If 
X  =  X{u)  is  a  random  variable,  so  is  Y{u)  =  g{X{u)),  since  it  is  a  mapping  from  the  sample 
space  to  the  real  line  which  is  a  composition  of  the  original  mapping  X  and  the  function  g,  as 
shown  in  Figure  5.10. 

Method  1  (find  the  CDF  first):  We  proceed  from  dehnition  to  hnd  the  CDF  oiY  =  g{X) 
as  follows: 

Fy{y)  =  P[Y  <y]=  P[g{X)  <  y]  =  P[X  G  A{y)] 
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where  Aiy)  =  {x  ■.  g{x)  <y}.  We  can  now  use  the  CDF  or  density  of  X  to  evaluate  the  extreme 
right-hand  side.  Once  we  hnd  the  CDF  of  D,  we  can  hnd  the  PMF  or  PDF  as  usual. 


y  = 


Range  of  X  corresponding 
to  Y  <=  y 

Figure  5.11;  Finding  the  CDF  of  Y  =  X"^. 


Example  5.4.1  (Application  of  Method  1)  Suppose  that  A  is  a  Laplacian  random  variable 
with  density 

Px{x)  = 

Find  the  CDF  and  PDF  ofY  =  X^. 

In  method  1,  we  hnd  the  CDF  of  Y  hrst,  and  then  differentiate  to  hnd  the  PDF.  First,  note  that 
Y  takes  only  nonnegative  values,  so  that  Fyiv)  =  0  for  t/  <  0.  For  y  >  0,  we  have 

Friy)  =  P[Y  <  y]  =  P[X^  <  y]  =  P[-^  <  X  <  ^] 

=  J^Px{x)dx  =  ^eP^^dx  =  e-^dx 

=  1  -  e-Yv  ,  y>0 


We  can  now  diherentiate  the  CDF  to  obtain  the  PDF  of  Y : 


Pviy) 


dy 


Friy) 


e-Yv 
^  ’ 


l/>0 


(The  CDF  and  PDF  are  zero  for  |/  <  0,  since  Y  only  takes  nonnegative  values.) 
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Method  2  (find  the  PDF  directly):  For  differentiable  g{x)  and  continuous  random  variables, 
we  can  compute  the  PDF  directly.  Suppose  that  g{x)  =  y  is  satisfied  for  x  =  xi,...,Xm-  We 
can  then  express  Xi  as  a  function  of  y:  Xi  =  hi{y)  For  g{x)  =  this  corresponds  to  Xi  =  ^/y 
and  X2  =  —^/y-  The  probability  of  X  lying  in  a  small  interval  [xi^Xi  +  Xx]  is  approximately 
px{xi)Xx,  where  we  take  the  increment  Xx  >  0.  For  smooth  g,  this  corresponds  to  Y  lying  in  a 
small  interval  around  y,  where  we  need  to  sum  up  the  probabilities  corresponding  to  all  possible 
values  of  x  that  get  us  near  the  desired  value  of  y.  We  therefore  get 


m 

PY{y)\Xy\  =  ^px{xi)Xx 
i=l 


where  we  take  the  magnitude  of  the  Y  increment  Xy  because  a  positive  increment  in  x  can  cause 
a  positive  or  negative  increment  in  g{x),  depending  on  the  slope  at  that  point.  We  therefore 
obtain 


Pviy)  = 

i=l 


Px{Xi) 


\dy/dx 


Xi=hi{y) 


(5.27) 


We  now  redo  Example  5.4.1  using  Method  2. 


Example  5.4.2  (application  of  Method  2)  For  the  setting  of  Example  5.4.1,  we  wish  to  find 
the  PDF  using  Method  2.  For  y  =  g{x)  =  we  have  x  =  X-^/y  (we  only  consider  y  >  0,  since 
the  PDF  is  zero  for  y  <  0),  with  derivative  dy/dx  =  2x.  We  can  now  apply  (5.27)  to  obtain: 


Priy) 


Px{,y/y)  px{,-^/y) 

|2^|  +  \-2^\ 


e-Vy 


y>o 


as  before. 


Since  Method  1  starts  from  the  definition  of  CDF,  it  generalizes  to  multiple  random  variables 
(i.e.,  random  vectors)  in  a  straightforward  manner,  at  least  in  principle.  For  example,  suppose 
that  Yi  =  gi{Xi,X2)  and  Y2  =  g2{Xi,  X2).  Then  the  joint  CDF  of  Yi  and  Y2  is  given  by 

FY„YAyi,y2)  =  P[Yi  <  yuY2  <  1/2]  =  P[gi{X,,X2)  <  2/1,  (72  (^1,^2)  <  2/2]  =  P[(Xi,X2)  €71(2/1,2/2)] 

where  ^4(2/1, 2/2)  =  {(2^1, 2:2)  :  gi{xi,X2)  <  yi,  g2{xi,X2)  <  2/2}-  In  principle,  we  can  now  use  the 
joint  distribution  to  compute  the  preceding  probability  for  each  possible  value  of  (2/1, 2/2)-  In 
general.  Method  1  works  for  Y  =  g(X),  where  Y  is  an  tt,- dimensional  random  vector  which  is 
a  function  of  an  m-dimensional  random  vector  X  (in  the  preceding,  we  considered  m  =  n  = 

2).  However,  evaluating  probabilities  involving  m-dimensional  random  vectors  can  get  pretty 
complicated  even  for  m  =  2.  A  generalization  of  Method  2  is  often  preferred  as  a  way  of  directly 
obtaining  PDFs  when  the  functions  involved  are  smooth  enough,  and  when  m  =  n.  We  review 
this  next. 

Method  2  for  random  vectors:  Suppose  that  Y  =  (Yi,...,Y„)^  is  an  n  x  1  random  vector 
which  is  a  function  of  another  n  x  1  vector  X  =  {Xi, Xn)'^ ■  That  is,  Y  =  g(X),  or  Yk  = 
gk{Xi,  ...,Xn),  k  =  1,  ..,n.  As  before,  suppose  that  y  =  g(x.)  has  m  solutions,  xi, ...,  x^,  with  the 
ith  solution  written  in  terms  of  y  as  Xj  =  hj(y).  The  probability  of  Y  lying  in  an  infinitesimal 
volume  is  now  given  by 

m 

PY{y)  |dy|  =  ^px(xi)  |dx| 

i=l 
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In  order  to  relate  the  lengths  of  the  vector  increments  \dy\  and  |(ix|,  it  no  longer  snffices  to 
consider  a  scalar  derivative.  We  now  need  the  Jacobian  matrix  of  partial  derivatives  of  y  =  g(x) 
with  respect  to  x,  defined  as: 


J(y;x) 


/  Jmi. 

dxi 


dVn 
\  dxi 


The  lengths  of  the  vector  increments  are  related  as 


dyi  \ 

dXn 


dVn 
dXn  / 


(5.28) 


My  I  =  Met  (j(y;x))  ||dx 


where  det  (M)  denotes  the  determinant  of  a  sqnare  matrix  M.  Thus,  if  y  =  (^(x)  has  m  solutions, 
xi,...,Xm,  with  the  zth  solution  written  in  terms  of  y  as  x*  =  hj(y),  then  the  density  at  y  is 
given  by 


PY(y)  = 

i=l 


Px(Xi) 

Met(J(y;x))| 


Xi=hi(y) 


(5.29) 


Depending  on  how  the  functional  relationship  between  X  and  Y  is  specihed,  it  might  sometimes 
be  more  convenient  to  find  the  Jacobian  of  x  with  respect  to  y; 


J(x;y) 


dxi  \ 
dy-n 


dxn  _  _  _  dx„ 

\  dyi  dyn  / 


(5.30) 


We  can  use  this  in  (5.29)  by  noting  the  two  Jacobian  matrices  for  a  given  pair  of  values  (x,  y) 
are  inverses  of  each  other: 

J(x;y)  =  (J(y;x))"^ 

This  implies  that  their  determinants  are  reciprocals  of  each  other: 


det(J(x;y)) 


1 

det(J(y;x)) 


We  can  therefore  rewrite  (5.29)  as  follows: 


PY(y)  =  ^Px(xi)  Met(J(x;y))| 

i=l 


Xi=hi(y) 


(5.31) 


Example  5.4.3  (Rectangular  to  Polar  Transformation):  For  random  variables  Xi,  X2 
with  joint  density  pxi,X2i  think  of  {Xi,X2)  as  a  point  in  two-dimensional  space  in  Cartesian 
coordinates.  The  corresponding  polar  coordinates  are  given  by 


R  = 


$  =  tan  ^ 


Xi 


(5.32) 


(a)  Find  the  general  expression  for  joint  density  Pr^<s>. 

(b)  Specialize  to  a  situation  in  which  Xi  and  X2  are  i.i.d.  iV(0, 1)  random  variables. 

Solution,  part  (a):  Finding  the  Jacobian  involves  taking  partial  derivatives  in  (5.32).  However, 
in  this  setting,  taking  the  Jacobian  the  other  way  around,  as  in  (5.30),  is  simpler: 


Xi  =  r  cos  (j)  ,  X2  =  r  sin  0 
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so  that 


/  dxi 

8x1 

\  / 

1  dr 

8<j> 

1  -  1 

1  8x2 

8x2 

y  dr 

8if) 

J  ^ 

polar)) 

=  r 

(cos^  0 

cos  0  — r  sin  0 
sin  0  r  cos 

\  Ul'  Uip  /  \ 

We  see  that 

Noting  that  the  rectangular-polar  transformation  is  one-to-one,  we  have  from  (5.31)  that 
Pfl.-K 0)  =  Vxx ,X2 (xi ,  X2 )  I det ( J (reef;  polar))  \ 

xi=r  cos  (l),X2=r  sin  0 

=  rpxi,X2  (^  cos  0,  r  sin  0)  ,  r  >  0,  0  <  0  <  27r 
Solution,  part  (b):  For  Xi,X2  i.i.d.  iV(0, 1),  we  have 


(5.33) 


Pxi,X2{xi:X2)  =  PxAxi)px2{x2)  =  ^=e  ^=e  ^2/2 

V^TT  v2vr 

Plugging  into  (5.33)  and  simplifying,  we  obtain 

0)  =  ,  r  >  0,  0  <  0  <  27r 

/TT 

We  can  hnd  the  marginal  densities  of  i?  and  <F  by  integrating  out  the  other  variable,  but  in  this 
case,  we  can  hnd  them  by  inspection,  since  the  joint  density  clearly  decomposes  into  a  product  of 
functions  of  r  and  0  alone.  With  appropriate  normalization,  each  of  these  functions  is  a  marginal 
density.  We  can  now  infer  that  R  and  $  are  independent,  with 

PR{r)  =  re~^  ,  r  >  0 

and 

P#(0)  =  ^  >  0  <  0  <  27r 

The  amplitude  R  in  this  case  follows  a  Rayleigh  distribution,  while  the  phase  $  is  uniformly 
distributed  over  [0,27r]. 


5.5  Expectation 

We  now  discuss  computation  of  statistical  averages,  which  are  often  the  performance  measures 
based  on  which  a  system  design  is  evaluated. 

Expectation:  The  expectation,  or  statistical  average,  of  a  function  of  a  random  variable  X  is 
dehned  as 

E[5f(X)]  =  J  g{x)p{x)dx  ,  continuous  random  variable  ,  , 

E[5f(X)]  =  g{x)p{x)  ,  discrete  random  variable  ^  ' 

Note  that  the  expectation  of  a  deterministic  constant,  therefore,  is  simply  the  constant  itself. 
Expectation  is  a  linear  operator:  We  have 

E  [oiWi  -1-  02^2  b]  =  aiE[Wi]  -1-  a2E[W2]  -|-  b 

where  oi,  02,  b,  are  any  constants. 
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Mean:  The  mean  of  a  random  variable  X  is  E[X]. 

Variance:  The  variance  of  a  random  variable  X  is  a  measure  of  how  much  it  fluctuates  around 
its  mean: 

var(X)  =  E  [(X  -  E[X])^]  (5.35) 

Expanding  out  the  square,  we  have 

var(X)  =  E  [X^  -  2XE[X]  +  (E[X])2] 

Using  the  linearity  of  expectation,  we  can  simplify  to  obtain  the  following  alternative  formula 
for  variance: 

var(X)  =  E[X2]  -  (E[X])^  (5.36) 

The  square  root  of  the  variance  is  called  the  standard  deviation. 

Effect  of  Scaling  and  Translation:  For  Y  =  aX  +  b,  it  is  left  as  an  exercise  to  show  that 

E[X]  =  E[aX  +  b]  =  aE[X]  +  b  ,  . 

var(X)  =  a^var(X) 


Normalizing  to  zero  mean  and  unit  variance:  We  can  specialize  (5.37)  to  Y 
see  that  E[y]  =  0  and  var(y)  =  1. 


4^,  to 

yvarpO 


Example  5.5.1  (PDF  after  scaling  and  translation):  If  X  has  density  px{x),  then  Y  = 
{X  —  a)/ b  has  density 

Pviy)  =  \b\px{by  +  a)  (5.38) 


This  follows  from  a  straightforward  application  of  Method  2  in  Section  5.4.  Specializing  to  a 
Gaussian  random  variable  X  ~  N{m,v‘^)  with  mean  m  and  variance  (we  review  mean  and 
variance  later),  consider  a  normalized  version  Y  =  (X  —  m)jv.  Applying  (5.38)  to  the  Gaussian 
density,  we  obtain: 


which  can  be  recognized  as  an  X(0, 1)  density.  Thus,  if  X  X(m,  v^),  then  Y  =  ^  ~  X(0, 1) 
is  a  standard  Gaussian  random  variable.  This  enables  us  to  express  probabilities  involving 
Gaussian  random  variables  compactly  in  terms  of  the  GDF  and  GGDF  of  a  standard  Gaussian 
random  variable,  as  we  see  later  when  we  deal  extensively  with  Gaussian  random  variables  when 
modeling  digital  communication  systems. 


Moments:  The  uth  moment  of  a  random  variable  X  is  dehned  as  E[X”].  From  (5.36),  we  see 
that  specifying  the  mean  and  variance  is  equivalent  to  specifying  the  hrst  and  second  moments. 
Indeed,  it  is  worth  rewriting  (5.36)  as  an  explicit  reminder  that  the  second  moment  is  the  sum 
of  the  mean  and  variance: 


E[X2]  =  (E[X])^  +  var(X) 


(5.39) 


Example  5.5.2  (Moments  of  an  exponential  random  variable):  Suppose  that  X  ~ 
Exp{\).  We  compute  its  mean  using  integration  by  parts,  as  follows: 


E[X]  =  xXe  dx  =  —xe 


—Xx 


,  poo  (j 

fo  77^® 


d 


=  e  ^^dx  = 


-A 


(5.40) 
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Similarly,  using  integration  by  parts  twice,  we  can  show  that 


E[X2] 


2 


Using  (5.36),  we  obtain 

var(X)  =  E[x2]  -  (E[A'])"  =  i  (5.41) 

In  general,  we  can  use  repeated  integration  by  parts  to  evaluate  higher  moments  of  the  exponential 
random  variable  to  obtain 

E[X^]  =  J  x^Xe-^^dx=  —  ,  n  =  1,2,3,. .. 

(A  proof  of  the  preceding  formula  using  mathematical  induction  is  left  as  an  exercise.) 


As  a  natural  follow-up  to  the  computations  in  the  preceding  example,  let  us  introduce  the  gamma 
function,  which  is  useful  for  evaluating  integrals  associated  with  expectation  computations  for 
several  important  random  variables. 

Gamma  function:  The  Gamma  function,  r(a;),  is  dehned  as 

POO 

r(x)  =  /  ,  x  >  0 

Jo 

In  general,  integration  by  parts  can  be  used  to  show  that 


r(a;  -I-  1)  =  xT{x)  ,  x  >  0 


(5.42) 


Noting  that  r(l)  =  1,  we  can  now  use  induction  to  specify  the  Gamma  function  for  integer 
arguments. 

r(n)  =  (n  —  1)!  ,  n  =  1,  2,  3, ...  (5.43) 

This  is  exactly  the  same  computation  as  we  did  in  Example  5.5.2:  r(n)  equals  the  the  {n  —  l)th 
moment  of  an  exponential  random  variable  with  A  =  1  (and  hence  mean  4  =  1). 

The  Gamma  function  can  also  be  computed  for  non-integer  arguments.  Just  an  integer  arguments 
of  the  Gamma  function  are  useful  for  exponential  random  variables,  ”  integer-plus-half’  arguments 
are  useful  for  evaluating  the  moments  of  Gaussian  random  variables.  We  can  evaluate  these  using 
(5.42)  given  the  value  of  the  gamma  function  at  x  =  1/2. 


r(i/2) 


t  2  e  *  dt  =  y/n 


For  example,  we  can  infer  that 

r(5/2)  =  (3/2)(l/2)r(l/2)  =  lx/)F 


(5.44) 


Example  5.5.3  (Mean  and  variance  of  a  Ganssian  random  variable):  We  now  show  that 
X  ~  N{m,  u^)  has  mean  m  and  variance  The  mean  of  X  is  given  by  the  following  expression: 


E[A'] 


(x  —  m)^ 


dx 
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Let  us  first  consider  the  change  of  variables  t  =  (x  —  m)/v,  so  that  dx  =  v  dt.  Then 

{tv  +  m)  .  ^  dt 

y2TTv'^ 

Note  that  is  an  odd  function,  and  therefore  integrates  out  to  zero  over  the  real  line.  We 

therefore  obtain 


recognizing  that  the  integral  on  the  extreme  right-hand  side  is  the  A^(0, 1)  PDF,  which  must 
integrate  to  one.  The  variance  is  given  by 

1  (x-m)^ 

,  e  dx 

With  a  change  of  variables  t  =  {x  —  m)/v  as  before,  we  obtain 

/oo  1  poo  -I 

t^  _ _ dt  =  2v‘^  /  dt 

-oo  Jo 

since  the  integrand  is  an  even  function  of  t.  Substituting  2;  =  f^/2,  so  that  dz  =  tdt  =  y/^dt, 
we  obtain 

var(W)  =  =  2^^^ 

=  2n2^r(3/2)  = 

since  r(3/2)  =  (l/2)r(l/2)  =  0F/2. 

The  change  of  variables  in  the  computations  in  the  preceding  example  is  actually  equivalent 
to  transforming  the  N{m,v‘^)  random  variable  that  we  started  with  to  a  standard  Gaussian 
A^(0, 1)  random  variable  as  in  Example  5.5.1.  As  we  mentioned  earlier  (this  is  important  enough 
to  be  worth  repeating),  when  we  handle  Gaussian  random  variables  more  extensively  in  later 
chapters,  we  prefer  making  the  transformation  up  front  when  computing  probabilities,  rather 
than  changing  variables  inside  integrals. 

As  a  hnal  example,  we  show  that  the  mean  of  a  Poisson  random  variable  with  parameter  A  is 
equal  to  A. 


var(A)  =  E[(X  —  m)^]  =  /  {x  —  m) 


Example  5.5.4  (Mean  of  a  Poisson  random  variable):  The  mean  is  given  by 

OO  OO  ^ 

E[;!:]  =  ^  kp\x  =  fc|  =  ^  fc-ge-* 

k=0  k=l 

where  we  have  dropped  the  fc  =  0  term  from  the  extreme  right  hand  side,  since  it  does  not 
contribute  to  the  mean.  Noting  that  I,  =  we  have 


A' 


e-"  = 


\A:-1 

Ae-^  y  - - -  =  A 


since 


CX3  CX) 


=  e 


k=l  k  >  1=0 

where  we  set  /  =  /c  —  1  to  get  an  easily  recognized  form  for  the  series  expansion  of  an  exponential. 
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5.5.1  Expectation  for  random  vectors 

So  far,  we  have  talked  about  expectations  involving  a  single  random  variable.  Expectations 
with  multiple  random  variables  are  defined  in  exactly  the  same  way:  as  in  (5.34),  replacing  the 
scalar  random  variable  and  the  corresponding  dummy  variable  for  summation  or  integration  by 
a  vector. 

E[5f(X)]  =  E[5f(Xi,  ...,X„)]  =  f  g(x)p(x)  dx  ,  jointly  continuous  random  variables 

=  ...  g(xi,  Xn)p(xi,  Xn)  dXi...dXn 


E[5f(X)]  =  E[5f(Xi,  ...,X„)]  =  ;  discrete  random  variables 

=  Exi  •••  9{Xl,  ...,Xn)p{Xi,  ...,  Xn) 

(5.45) 

Product  of  expectations  for  independent  random  variables:  When  the  random  variables 
involved  are  independent,  and  the  function  whose  expectation  is  to  be  evaluated  decomposes 
into  a  product  of  functions  of  each  individual  random  variable,  then  the  preceding  computation 
involves  a  product  of  expectations,  each  involving  only  one  random  variable: 

E[gi{Xi)...gn{Xn)]  =E[gi{Xi)]...E[gn{Xn)]  ,  Xi, ...,  independent  (5.46) 

Example  5.5.5  (Computing  an  expectation  involving  independent  random  variables): 

Suppose  that  Xi  ~  iV(l,  1)  and  X2  ~  X(— 3,  4)  are  independent.  Find  E[(Xi  +  X2)^]. 

Solution:  We  have 

E[(Xi  +  X2f]  =  E[X^  +  Xi  +  2X1X2] 

We  can  now  use  linearity  to  compute  the  expectations  of  each  of  the  three  terms  on  the  right- 
hand  side  separately.  We  obtain  E[Xf]  =  (E[Xi])^  -|-  var(Xi)  =  -|-  1  =  2,  E[X|]  =  (E[X2])^  -|- 

var(X2)  =  (—3)^  -|-  4  =  13,  and  E[2XiX2]  =  2E[Xi]E[X2]  =  2(1) (—3)  =  —6,  so  that 

E[(Xi  +  X2)"]  =  2  + 13 -  6  =  9 

Variance  is  a  measure  of  how  a  random  variable  fluctuates  around  its  means.  Covariance,  dehned 
next,  is  a  measure  of  how  the  fluctuations  of  two  random  variables  around  their  means  are 
correlated. 

Covariance:  The  covariance  of  Xi  and  X2  is  defined  as 

cov(Xi,  X2)  =  E  [(Xi  -  E[Xi])  (X2  -  E[X2])]  (5.47) 

As  with  variance,  we  can  also  obtain  the  following  alternative  formula: 

cov(Xi,  X2)  =  E[XiX2]  -  E[Xi]E[X2]  (5.48) 

Variance  is  the  covariance  of  a  random  variable  with  itself:  It  is  immediate  from  the 
definition  that 

var(X)  =  cov(X,  X) 

Uncorrelated  random  variables:  Random  variables  Xi  and  X2  are  said  to  be  uncorrelated 
if  cov(Xi,X2)  =  0. 

Independent  random  variables  are  uncorrelated:  If  Xi  and  X2  are  independent,  then  they 
are  uncorrelated. 

This  is  easy  to  see  from  (5.48),  since  E[XiX2]  =  E[Xi]E[X2]  using  (5.46). 
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Uncorrelated  random  variables  need  not  be  independent:  Consider  X  ~  A^(0,1)  and 
Y  =  X"^.  We  see  that  that  E[Xy]  =  E[X^]  =  0  by  the  symmetry  of  the  A^(0, 1)  density  aronnd 
the  origin,  so  that 

cov(X,  Y)  =  E[xr]  -  E[X]E[y]  =  0 

Clearly,  X  and  Y  are  not  independent,  since  knowing  the  value  of  X  determines  the  value  of  Y. 

As  we  discuss  in  the  next  section,  uncorrelated  jointly  Gaussian  random  variables  are  indeed 
independent.  The  joint  distribution  of  such  random  variables  is  determined  by  means  and  co- 
variances,  hence  we  also  postpone  more  detailed  discussion  of  covariance  computation  until  our 
study  of  joint  Gaussianity. 


5.6  Gaussian  Random  Variables 


We  begin  by  repeating  the  dehnition  of  a  Gaussian  random  variable. 

Gaussian  random  variable:  The  random  variable  X  is  said  to  follow  a  Gaussian,  or  normal 
distribution  if  its  density  is  of  the  form; 

y 

where  m  =  E[X]  is  the  mean  of  X,  and  =  var(X)  is  the  variance  of  X.  The  Gaussian  density 
is  therefore  completely  characterized  by  its  mean  and  variance. 

Notation  for  Gaussian  distribution:  We  use  N{m,v‘^)  to  denote  a  Gaussian  distribution 
with  mean  m  and  variance  and  use  the  shorthand  X  ~  N{m,v^)  to  denote  that  a  random 
variable  X  follows  this  distribution. 

We  have  already  noted  the  characteristic  bell  shape  of  the  Gaussian  PDF  in  the  example  plotted 
in  Figure  5.5:  the  bell  is  centered  around  the  mean,  and  its  width  is  determined  by  the  variance. 
We  now  develop  a  detailed  framework  for  efficient  computations  involving  Gaussian  random 
variables. 

Standard  Gaussian  random  variable:  A  zero  mean,  unit  variance  Gaussian  random  variable, 
X  ~  A^(0, 1),  is  termed  a  standard  Gaussian  random  variable. 

An  important  property  of  Gaussian  random  variables  is  that  they  remain  Gaussian  under  scaling 
and  translation.  Suppose  that  X  ~  N{m,v'^).  Dehne  Y  =  aX  -|-  b,  where  a,  b  are  constants 
(assume  a  7^  0  to  avoid  triviality).  The  density  of  Y  can  be  found  as  follows: 


exp 


{X 


m 


2n2 


00  <  X  <  00 


(5.49) 


p{y) 


p{x) 

I  — I 

'  dx  ' 


x={y-b)la 


Noting  that  ^  =  o,  and  plugging  in  (5.49),  we  obtain 

=  72^  (“  ^y  ~  /(2a"^")) 


Gomparing  with  (5.49),  we  can  see  that  Y  is  also  Gaussian,  with  mean  my  =  am  +  b  and  variance 
Vy  =  a^v^.  This  is  important  enough  to  summarize  and  restate. 

Gaussianity  is  preserved  under  scaling  and  translation 

If  X  ~  N{m,  n^),  then  Y  =  aX  -|-  5  ~  N{am  +  b,  a^n^). 
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As  a  consequence  of  the  preceding  result,  any  Gaussian  random  variable  can  be  scaled  and 
translated  to  obtain  a  “standard”  Gaussian  random  variable  with  zero  mean  and  unit  variance. 
For  X  ~  N{m,v‘^),  Y  =  aX  +  6  ~  iV(0, 1)  if  am  +  6  =  0  and  =  1  to  have  a  =  v,  b  =  —vm. 
That  is,  Y  =  (X  —  m)/n  ~  N{0, 1). 

Standard  Gaussian  random  variable 

A  standard  Gaussian  random  variable  iV(0, 1)  has  mean  zero  and  variance  one. 

Conversion  of  a  Gaussian  random  variable  into  standard  form 

If  X  ~  X(m,  v^),  then  ^  ~  X(0, 1). 

As  the  following  example  illustrates,  this  enables  us  to  express  probabilities  involving  any  Gaus¬ 
sian  random  variable  as  probabilities  involving  a  standard  Gaussian  random  variable. 

Example  5.6.1  Suppose  that  X  ~  X(5,9).  Then  (X  —  5)/\/9  =  (X  —  5)/3  ~  X(0, 1).  Any 
probability  involving  X  can  now  be  expressed  as  a  probability  involving  a  standard  Gaussian 
random  variable.  For  example, 

P[X  >  11]  =  P[(X  -  5)/3  >  (11  -  5)/3]  =  P[X(0, 1)  >  2] 

We  therefore  set  aside  special  notation  for  the  cumulative  distribution  function  (GDF)  <F(x)  and 
complementary  cumulative  distribution  function  (GGDF)  Q{x)  of  a  standard  Gaussian  random 
variable.  By  virtue  of  the  standard  form  conversion,  we  can  now  express  probabilities  involving 
any  Gaussian  random  variable  in  terms  of  the  $  or  Q  functions.  The  dehnitions  of  these  functions 
are  illustrated  in  Figure  5.12,  and  the  corresponding  formulas  are  specihed  below. 


Figure  5.12:  The  $  and  Q  functions  are  obtained  by  integrating  the  X(0, 1)  density  over  appro¬ 
priate  intervals. 


dt  (5.50) 

dt  (5.51) 

See  Figure  5.13  for  a  plot  of  these  functions.  By  dehnition,  <F(x)  +  (5(a:)  =  1.  Furthermore,  by  the 
symmetry  of  the  Gaussian  density  around  zero,  Q{—x)  =  <F(x).  Gombining  these  observations, 
we  note  that  Q(—x)  =  1  —  Q(x),  so  that  it  suffices  to  consider  only  positive  arguments  for  the 
Q  function  in  order  to  compute  probabilities  of  interest. 

Let  us  now  consider  a  few  more  Gaussian  probability  computations. 


$(a:)  =  P[X(0,  l)<x]  =  J  exp 

Q{x)  =  P[X(0,  l)>x]  =  exp 
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Figure  5.13:  The  $  and  Q  functions. 


Example  5.6.2  X  is  a  Gaussian  random  variable  with  mean  m  =  —5  and  variance  =  4.  Find 
expressions  in  terms  of  the  Q  function  with  positive  arguments  for  the  following  probabilities: 
P[X  >  3],  P[X  <  -8],  P[X  <  -1],  P[3  <  X  <  6],  P[X2  -  2X  >  15]. 

Solution:  We  solve  this  problem  by  normalizing  X  to  a  standard  Gaussian  random  variable 

X-m  _  X+5 

Plx  >  3|  =  =  4]  =  Q(4) 

Plx  <  -8]  =  =  -1.6|  =  4(-1.5)  =  Q(1.5) 

PlX  <  -1|  =  =  2]  =  4(2)  =  1  -  Q(2) 

P[3  <  X  <  6]  =  P[4  =  ^  <  ^  <  ^  =  5.5] 

=  4>(5.5)  -  «F(4)  =  ((1  -  Q(5.5))  -  (1  -  g(4)))  =  g(4)  -  g(5.5) 

Gomputation  of  the  probability  that  X^  —  2X  >  15  requires  that  we  express  this  event  in  terms 
of  simpler  events  by  factorization: 

X^  -  2X  -  15  =  X^  -  5X  +  3X  -  15  =  (X  -  5)(X  +  3) 

This  shows  that  X^  —  2X  >  15,  or  X^  —  2X  —  15  >  0,  if  and  only  if  X  —  5  >  0  and  X  +  3  >  0, 
or  X  —  5  <  0  and  X  +  3  <  0.  The  hrst  event  simplihes  to  X  >5,  and  the  second  to  X  <  —3,  so 
that  the  desired  probability  is  a  union  of  two  mutually  exclusive  events.  We  therefore  have 

P[X2  -  2X  >  15]  =  P[X  >  5]  +  P[X  <  -3]  =  g(^)  +  <h(^) 

=  g(5)  +  $(i)  =  g(5)  +  i-g(i) 


Interpreting  the  transformation  to  standard  Ganssian:  For  X  ~  N{m,v‘^),  the  transfor¬ 
mation  to  standard  Gaussian  tells  us  that 


P[X  >  m  +  av] 


P 


X  —  m 
- >  a 

V 


g(«) 


That  is,  the  tail  probability  of  a  Gaussian  random  probability  depends  only  on  the  number  of 
standard  deviations  a  away  from  the  mean.  More  generally,  the  transformation  is  equivalent  to 
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the  observation  that  the  probability  of  an  inhnitesinial  interval  [x,  x  +  Ax]  depends  only  on  its 
normalized  distance  from  the  mean,  and  its  normalized  length 


P[x  <  X  <x  Ax]  p(x)  Ax 


Ax 

V 


Relating  the  Q  function  to  the  error  function:  Mathematical  software  packages  such  as 
Matlab  often  list  the  error  function  and  the  complementary  error  function,  defined  for  x  >  0  by 

erf(x)  =  ^  e-^^dt 

erfc(x)  =  1  —  erf(x)  =  e~^^dt 

Recognizing  the  form  of  the  iV(0,  \)  density,  given  by  ,  we  see  that 

erf(x)  =  2P[0  <  A  <  x]  ,erfc(x)  =  2P[X  >  x] 
where  X  ~  A(0,  |).  Transforming  to  standard  Gaussian  as  usual,  we  see  that 


erfc(x)  =  2P[X  >  x] 


X- 


X  —  0 

072 


We  can  invert  this  to  compute  the  Q  function  for  positive  arguments  in  terms  of  the  complemen¬ 
tary  error  function,  as  follows: 


Q(x)  =  ^erfc  ^  >  0  (5-52) 

For  X  <  0,  we  can  compute  Q{x)  =  1  —  Q(—x)  using  the  preceding  equation  to  evaluate  the 
right-hand  side.  While  the  Communications  System  Toolbox  in  Matlab  has  the  Q  function  built 
in  as  qfunc{-),  we  provide  a  Matlab  code  fragment  for  computing  the  Q  function  based  on  the 
complementary  error  function  (available  without  subscription  to  separate  toolboxes)  below. 

Code  Fragment  5.6.1  (Computing  the  Q  function) 

7oQ  function  computed  using  erfc  (works  for  vector  inputs) 
function  z  =  qfunction(x) 
b=  (x>=0) ; 

yl=b.*x;  7„select  the  positive  entries  of  x 

y2=(l-b)  .  *(-x) ;  7oSelect,  and  flip  the  sign  of,  negative  entries  in  x 
zl  =  (0.5*erfc(yl  ./sqrt(2)))  .*b;  7oQ(x)  for  positive  entries  in  x 

z2  =  (l-0.5*erfc(y2./sqrt(2)))  .*(l-b) ;  7oQ(x)  =  1  -  Q(-x)  for  negative  entries  in  x 
z=zl+z2;  7ofinal  answer  (works  for  x  with  positive  or  negative  entries) 

Example  5.6.3  (Binary  on-off  keying  in  Gaussian  noise)  A  received  sample  F  in  a  com¬ 
munication  system  is  modeled  as  follows:  Y  =  m  -|-  A  if  1  is  sent,  and  Y  =  A  if  0  is  sent,  where 
A  ~  A(0,  x^)  is  the  contribution  of  the  receiver  noise  to  the  sample,  and  where  \m\  is  a  measure 
of  the  signal  strength.  Assuming  that  m  >  0,  suppose  that  we  use  the  simple  decision  rule  that 
splits  the  difference  between  the  average  values  of  the  observation  under  the  two  scenarios:  say 
that  1  is  sent  if  F  >  m/2,  and  say  that  0  is  sent  if  F  <  m/2.  Assuming  that  both  0  and  1 
are  equally  likely  to  be  sent,  the  signal  power  is  (l/2)m^  -|-  (1/2)0^  =  m?  12.  The  noise  power  is 
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E[Ar2]  =  v^.  Thus,  SNR  =  g. 

(a)  What  is  the  conditional  probability  of  error,  conditioned  on  0  being  sent. 

(b)  What  is  the  conditional  probability  of  error,  conditioned  on  1  being  sent. 

(c)  What  is  the  (unconditional)  probability  of  error  if  0  and  1  are  equally  likely  to  have  been 
sent. 

(d)  What  is  the  error  probability  for  SNR  of  13  dB? 

Solution: 

(a)  Since  Y  ~  N{0,v‘^)  given  that  0  is  sent,  the  conditional  probability  of  error  is  given  by 

(Tft  /2  —  0  \  /  Tft  \ 

- j  =  Q 

(b)  Since  Y  ~  N{m,v‘^)  given  that  1  is  sent,  the  conditional  probability  of  error  is  given  by 

Fell  =  ^[say  0|1  sent]  =  P[Y  <  m/2\l  sent]  =  <h  ~  (-^)  =  Q  (^) 

(c)  If  TTo  is  the  probability  of  sending  0,  then  the  unconditional  error  probabillity  is  given  by 
Fe  =  TToFelo  +  (1  -  7ro)Fe|i  =  Q  =  Q  (VFiVR/2) 
regardless  of  tto  for  this  particular  decision  rule. 

(d)  For  SNR  of  13  dB,  we  have  SNR{raw)  =  iQSNR{db)/io  _  xo^.3  ~  20,  so  that  the  error 
probability  evaluates  to  Fg  =  Q(\/T0)  =  7.8  x  10“'^. 

Figure  5.14  shows  the  probability  of  error  on  a  log  scale,  plotted  against  the  SNR  in  dB.  This 


Figure  5.14:  Probability  of  error  versus  SNR  for  on-off-keying. 


is  the  hrst  example  of  the  many  error  probability  plots  that  we  will  see  in  this  chapter. 

A  Matlab  code  fragment  (cosmetic  touches  omitted)  for  generating  Figure  5.14  in  Example  5.6.3 
is  as  below. 

Code  Fragment  5.6.2  (Error  probability  computation  and  plotting) 
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7oPlot  of  error  probability  versus  SNR  for  on-off  keying 

snrdb  =  -5:0.1:15;  Zvector  of  SNRs  (in  dB)  for  which  to  evaluate  error  prob 
snr  =  10 . ~ (snrdb/10) ;  Zvector  of  raw  SNRs 

pe  =  qfunction(sqrt (snr/2) ) ;  Zvector  of  error  probabilities 

Zplot  error  prob  on  log  scale  versus  SNR  in  dB 

semilogy (snrdb, pe) ; 

ylabel ( ^ Error  Probability ’ ) ; 

xlabeK’SNR  (dB)’); 


The  preceding  example  illustrates  a  more  general  observation  for  signaling  in  AWGN:  the  proba¬ 
bility  of  error  involves  terms  such  as  Q{y/a  SNR),  where  the  scale  factor  a  depends  on  properties 
of  the  signal  constellation,  and  SNR  is  the  signal-to-noise  ratio.  It  is  therefore  of  interest  to  un¬ 
derstand  how  the  error  probability  decays  with  SNR.  As  shown  in  Appendix  5. A,  there  are  tight 
analytical  bounds  for  the  Q  function  which  can  be  used  to  deduce  that  it  decays  exponentially 
with  its  argument,  as  stated  in  the  following. 

Asymptotics  of  Q{x)  for  large  arguments:  For  large  a:  >  0,  the  exponential  decay  of  the  Q 
function  dominates.  We  denote  this  by 


Q{x)  =  , 

which  is  shorthand  for  the  following  limiting  result: 


lim  IHIRM  =  1 


X  —)■  oo 


-a:V2 


(5.53) 


(5.54) 


These  asymptotics  play  a  key  role  in  design  of  communication  systems.  Since  events  that  cause 
bit  errors  have  probabilities  involving  terms  such  as  Q(\/a  SNR)  =  e““  snr/2 ^  when  there  are 
multiple  events  that  can  cause  bit  errors,  the  ones  with  the  smallest  rates  of  decay  a  dominate 
performance.  We  can  therefore  focus  on  these  worst-case  events  in  our  designs  for  moderate  and 
high  SNR.  This  simplistic  view  does  not  quite  hold  in  heavily  coded  systems  operating  at  low 
SNR,  but  is  still  an  excellent  perspective  for  arriving  at  a  coarse  link  design. 


5.6.1  Joint  Gaussianity 

Often,  we  need  to  deal  with  multiple  Gaussian  random  variables  dehned  on  the  same  probability 
space.  These  might  arise,  for  example,  when  we  sample  hltered  WGN.  In  many  situations  of 
interest,  not  only  are  such  random  variables  individually  Gaussian,  but  they  satisfy  a  stronger 
joint  Gaussianity  property.  Just  as  a  Gaussian  random  variable  is  characterized  by  its  mean  and 
variance,  jointly  Gaussian  random  variables  are  characterized  by  means  and  covariances.  We  are 
also  interested  in  what  happens  to  these  random  variables  under  linear  operations,  corresponding, 
for  example,  to  hltering.  Hence,  we  hrst  review  mean  and  covariance,  and  their  evolution  under 
linear  operations  and  translations,  for  arbitrary  random  variables  dehned  on  the  same  probability 
space. 

Covariance:  The  covariance  of  random  variables  Xi  and  X2  measures  the  correlation  between 
how  they  vary  around  their  means,  and  is  given  by 

cov(Xi,  A2)  =  E  [(Ai  -  E[Ai])(A2  -  E[A2])]  =  E[AiA2]  -  E[Xi]E[A2] 

The  second  formula  is  obtained  from  the  hrst  by  multiplying  out  and  simplifying: 

E  [(Xi  -  E[Xi1)(A2  -  E[A2l)l  =  E  [X1X2  -  EW1IX2  +  EW1IEW2I  -  AiEWall 

=  E[AiA2]  -  E[Ai]E[A2]  +  E[Xi]E[A2]  -  E[Xi]E[A2]  =  E[AiA2]  -  E[Xi]E[A2] 
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where  we  use  the  linearity  of  the  expectation  operator  to  pull  out  constants. 
Uncorrelatedness:  Xi  and  X2  are  said  to  be  uncorrelated  if  cov(Xi,X2)  =  0. 

Independent  random  variables  are  uncorrelated:  If  Xi  and  X2  are  independent,  then 

cov(Xi,X2)  =  E[XiX2]  -  E[Xi]E[X2]  =  E[Xi]E[X2]  -  E[Xi]E[X2]  =  0 

The  converse  is  not  true  in  general;  that  is,  uncorrelated  random  variables  need  not  be  inde¬ 
pendent.  However,  we  shall  see  that  jointly  Gaussian  uncorrelated  random  variables  are  indeed 
independent. 

Variance:  Note  that  the  variance  of  a  random  variable  is  its  covariance  with  itself: 

var(X)  =  cov(X,  V)  =  E[{X  -  E[X])2]  =  E[X2]  -  (E[X])^ 

The  use  of  matrices  and  vectors  provides  a  compact  way  of  representing  and  manipulating  means 
and  covariances,  especially  using  software  programs  such  as  Matlab.  Thus,  for  random  variables 
Xi,  ...,Xm,  we  dehne  the  random  vector  X  =  {Xi, ...,  and  arrange  the  means  and  pairwise 

covariances  in  a  vector  and  matrix,  respectively,  as  follows. 

Mean  vector  and  covariance  matrix:  Consider  an  arbitrary  m-dimensional  random  vector 
X  =  (Xi, ...,  Xm)^.  The  m  X  1  mean  vector  of  X  is  dehned  as  m^  =  E[X]  =  (E[Xi], ...,  E[Xm])'^. 
The  mxm  covariance  matrix  Cx  has  (i,  j)th  entry  given  by  the  covariance  between  the  Hh  and 
jth  random  variables: 

Cx{t,j)  =  cov(X,,X,)  =  E  [(X,  -  E[X,])(X,-  -  E[X,-])]  =  E  [X,X,]  -  E[X,]E[X,-] 

More  compactly. 


Cx  =  E[(X  -  E[X])(X  -  E[X])'^]  =  E[XX^]  -  E[X](E[X])^ 

Notes  on  covariance  computation:  Computations  of  variance  and  covariance  come  up  often 
when  we  deal  with  Gaussian  random  variables,  hence  it  is  useful  to  note  the  following  properties 
of  covariance. 

Property  1:  Covariance  is  unaffected  by  adding  constants. 

cov(X  +  a,Y  +  h)  =  cov(X,  Y)  for  any  constants  a,  b 

Covariance  provides  a  measure  of  the  correlation  between  random  variables  after  subtracting  out 
their  means,  hence  adding  constants  to  the  random  variables  (which  just  translates  their  means) 
does  not  affect  covariance. 

Property  2:  Covariance  is  a  bilinear  function  (i.e.,  it  is  linear  in  both  its  arguments). 


cov(aiXi  -|-  02X2,  03X3  -|-  04X4)  =  aia3Cov(Xi,  X3)  -|-  aia4Cov(Xi,  X4) 

-h  a2a3Cov(X2,  X3)  a2a4Cov(X2,  X4) 


By  Property  1,  it  is  clear  that  we  can  always  consider  zero  mean,  or  centered,  versions  of  random 
variables  when  computing  the  covariance.  An  example  that  frequently  arises  in  performance 
analysis  of  communication  systems  is  a  random  variable  which  is  a  sum  of  a  deterministic  term 
(e.g.,  due  to  a  signal),  and  a  zero  mean  random  term  (e.g.  due  to  noise).  In  this  case,  dropping 
the  signal  term  is  often  convenient  when  computing  variance  or  covariance. 

AfRne  transformations:  For  a  random  vector  X,  the  analogue  of  scaling  and  translating  a 
random  variable  is  a  linear  transformation  using  a  matrix,  together  with  a  translation.  Such  a 
transformation  is  called  an  affine  transformation.  That  is,  Y  =  AX+b  is  an  affine  transformation 
of  X,  where  A  is  a  deterministic  matrix  and  b  a  deterministic  vector. 
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Example  5.6.4  (Mean  and  variance  after  an  afRne  transformation):  Let  Y  =  Xi  — 

2X2  +  4,  where  Xi  has  mean  -1  and  variance  4,  X2  has  mean  2  and  variance  9,  and  the  covariance 
cov(Xi,X2)  =  —3.  Find  the  mean  and  variance  of  Y. 

Solution:  The  mean  is  given  by 

E[F]  =  E[Xi]  -  2E[X2]  +  4  =  -1  -  2(2)  +  4  =  -1 

The  variance  is  computed  as 

var(y)  =  cov(y,  Y)  =  cov(Xi  -  2X2  +  4,  Xi  -  2X2  +  4) 

=  cov(Xi,  Xi)  -  2cov(Xi,  X2)  -  2cov(X2,  Xi)  +  4cov(X2,  X2) 

where  the  constant  drops  out  because  of  Property  1.  We  therefore  obtain  that 

var(y)  =  cov(Xi,  Xi)  -  4cov(Xi,  X2)  +  4cov(X2,  X2)  =  4  -  4(-3)  +  4(9)  =  52 


Computations  such  as  those  in  the  preceding  example  can  be  compactly  represented  in  terms 
of  matrices  and  vectors,  which  is  particularly  useful  for  computations  for  random  vectors.  In 
general,  an  affine  transformation  maps  one  random  vector  into  another  (of  possibly  different 
dimension),  and  the  mean  vector  and  covariance  matrix  evolve  as  follows. 

Mean  and  covariance  evolution  under  affine  transformation 

If  X  has  mean  m  and  covariance  C,  and  Y  =  AX  +  b, 
then  Y  has  mean  my  =  Am  +  b  and  covariance  Cy  =  ACA^. 

To  see  this,  first  compute  the  mean  vector  of  Y  using  the  linearity  of  the  expectation  operator: 

my  =  E[Y]  =  E[AX  +  b]  =  AE[X]  +  b  =  Am  +  b  (5.55) 

This  also  implies  that  the  “zero  mean”  version  of  Y  is  given  by 

Y  -  E[Y]  =  (AX  +  b)  -  (Amx  +  b)  =  A(X  -  mx) 

so  that  the  covariance  matrix  of  Y  is  given  by 

Cy  =  E[(Y  -  E[Y])(Y  -  E[Y])'^]  =  E[A(X  -  m)(X  -  m)^A'^]  =  ACA'^  (5.56) 

Note  that  the  dimensions  of  X  and  Y  can  be  different:  X  can  be  m  x  1,  A  can  be  u  x  m,  and  Y, 
b  can  be  n  X  1,  where  m,  n  are  arbitrary.  We  also  note  below  that  mean  and  covariance  evolve 
separately  under  such  transformations. 

Mean  and  covariance  evolve  separately  under  affine  transformations:  The  mean  of  Y 
depends  only  on  the  mean  of  X,  and  the  covariance  of  Y  depends  only  on  the  covariance  of  X. 
Furthermore,  the  additive  constant  b  in  the  transformation  does  not  affect  the  covariance,  since 
it  influences  only  the  mean  of  Y. 

Example  5.6.4  redone:  We  can  check  that  we  get  the  same  result  as  before  by  setting 


mx  = 


Cx  = 


4 

-3 


A  =  (l  -2),  b  =  4 


(5.57) 


and  applying  (5.55)  and  (5.56). 

Jointly  Gaussian  random  variables,  or  Gaussian  random  vectors:  Random  variables 
Xi,  ...,Xm  defined  on  a  common  probability  space  are  said  to  be  jointly  Gaussian,  or  the  m  x  1 
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random  vector  X  =  (Xi, is  termed  a  Gaussian  random  vector,  if  any  linear  combination 
of  these  random  variables  is  a  Gaussian  random  variable.  That  is,  for  any  scalar  constants 
oi, Om,  the  random  variable  oiXi  +  ...  +  a^X^  is  Gaussian. 

A  Gaussian  random  vector  is  completely  characterized  by  its  mean  vector  and  co- 
variance  matrix:  This  is  a  generalization  of  the  observation  that  a  Gaussian  random  variable  is 
completely  characterized  by  its  mean  and  variance.  We  derive  this  in  Problem  5.48,  but  provide 
an  intuitive  argument  here.  The  dehnition  of  joint  Gaussianity  only  requires  us  to  characterize 
the  distribution  of  an  arbitrarily  chosen  linear  combination  of  Xi, ...,  X^,.  For  a  Gaussian  random 
vector  X  =  (Xi,  ...,Xm)^,  consider  Y  =  OiXi  +  ...  +  amXm,  where  Oi,  ...,0^  can  be  any  scalar 
constants.  By  dehnition,  X  is  a  Gaussian  random  variable,  and  is  completely  characterized  by 
its  mean  and  variance.  We  can  compute  these  in  terms  of  m^  and  Cx  using  (5.55)  and  (5.56) 
by  noting  that  Y  =  a^X,  where  a  =  (oi, ...,  OmY' ■  Thus, 

my  =  a^mjf 
Cy  =  var(X)  =  a^Cya 

We  have  therefore  shown  that  we  can  characterize  the  mean  and  variance,  and  hence  the  density, 
of  an  arbitrarily  chosen  linear  combination  Y  if  and  only  if  we  know  the  mean  vector  m^  and 
covariance  matrix  Cx-  As  we  see  in  Problem  5.48,  this  is  the  basis  for  the  desired  result  that 
the  distribution  of  Gaussian  random  vector  X  is  completely  characterized  by  m^  and  Cx- 

Notation  for  joint  Gaussianity:  We  use  the  notation  X  ~  X(m,  C)  to  denote  a  Gaussian 
random  vector  X  with  mean  vector  m  and  covariance  matrix  C. 

The  preceding  dehnitions  and  observations  regarding  joint  Gaussianity  apply  even  when  the 
random  variables  involved  do  not  have  a  joint  density.  For  example,  it  is  easy  to  check  that, 
according  to  this  dehnition,  Xi  and  X2  =  4Xi  —  1  are  jointly  Gaussian.  However,  the  joint  density 
of  Xi  and  X2  is  not  well-dehned  (unless  we  allow  delta  functions),  since  all  of  the  probability 
mass  in  the  two-dimensional  {xi,X2)  plane  is  collapsed  onto  the  line  X2  =  4a;i  —  1.  Of  course, 
since  X2  is  completely  determined  by  Xi,  any  probability  involving  Xi,X2  can  be  expressed  in 
terms  of  Xi  alone.  In  general,  when  the  m-dimensional  joint  density  does  not  exist,  probabilities 
involving  Xi,...,Xm  can  be  expressed  in  terms  of  a  smaller  number  of  random  variables,  and 
can  be  evaluated  using  a  joint  density  over  a  lower- dimensional  space.  A  necessary  and  sufficient 
condition  for  the  joint  density  to  exist  is  that  the  covariance  matrix  is  invertible. 

Joint  Gaussian  density  exists  if  and  only  if  the  covariance  matrix  is  invertible:  We 

do  not  prove  this  result,  but  discuss  it  in  the  context  of  the  two-dimensional  density  in  Example 

5.6.5. 

Joint  Gaussian  density:  ForX=(Xi,...,Xj~  X(m,  C),  if  C  is  invertible,  the  joint  density 
exists  and  takes  the  following  form  (we  skip  the  derivation,  but  see  Problem  5.48): 

p{xi,...,x^)  =p(x)  =  — ^=^==exp  |^-^(x-m)^C"^(x-m)^  (5.58) 

where  I  Cl  denotes  the  determinant  of  C. 


Example  5.6.5  (Two-dimensional  joint  Gaussian  density)  In  order  to  visualize  the  joint 
Gaussian  density  (this  is  not  needed  for  the  remainder  of  the  development,  hence  this  example 
can  be  skipped),  let  us  consider  two  jointly  Gaussian  random  variables  X  and  Y .  In  this  case,  it 
is  convenient  to  dehne  the  normalized  correlation  between  X  and  Y  as 


p(X,X)  =  ^ 

^var(X)var(X) 


(5.59) 
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(a)  Joint  Gaussian  Density  (b)  Contours  of  density 

Figure  5.15:  Joint  Gaussian  density  and  its  contours  for  =  1,  ay  =  4  and  p  =  —0.5. 

Thus,  cov(X,  Y)  =  paxO'Y,  where  var(X)  =  a^,  var(F)  =  ay,  and  the  covariance  matrix  for  the 
random  vector  (X,  is  given  by 


/  a|  paxCTy  \ 

V  PCTxCry  O-y  J 


(5.60) 


It  is  shown  in  Problem  5.47  that  |p|  <  1.  For  |p|  =  1,  it  is  easy  to  check  that  the  covariance  matrix 
has  determinant  zero,  hence  the  joint  density  formula  (5.58)  cannot  be  applied.  As  shown  in 
Problem  5.47,  this  has  a  simple  geometric  interpretation:  \p\  =  1  corresponds  to  a  situation  when 
X  and  Y  are  affine  functions  of  each  other,  so  that  all  of  the  probability  mass  is  concentrated 
on  a  line,  hence  a  two-dimensional  density  does  not  exist.  Thus,  we  need  the  strict  inequality 
IpI  <  1  for  the  covariance  matrix  to  be  invertible.  Assuming  that  |p|  <  1,  we  plug  (5.60)  into 
(5.58),  setting  the  mean  vector  to  zero  without  loss  of  generality  (a  nonzero  mean  vector  simply 
shifts  the  density).  We  get  the  joint  density  shown  in  Figure  5.15  for  =  1,  ay  =  4  and 
p  =  —0.5.  Since  Y  has  larger  variance,  the  density  decays  more  slowly  in  Y  than  in  X.  The 
negative  normalized  correlation  leads  to  contour  plots  given  by  tilted  ellipses,  corresponding  to 
setting  quadratic  function  x^C”^x  in  the  exponent  of  the  density  to  different  constants. 
Exercise:  Show  that  the  ellipses  shown  in  Figure  5.15(b)  can  be  described  as 

+  ay^  +  hxy  =  c 


specifying  the  values  of  a  and  b. 


While  we  hardly  ever  integrate  the  joint  Gaussian  density  to  compute  probabilities,  we  use  its 
form  to  derive  many  important  results.  One  such  result  is  stated  below. 

Uncorrelated  jointly  Gaussian  random  variables  are  independent:  This  follows  from 
the  form  of  the  joint  Gaussian  density  (5.58).  If  Xi,  ...,Xm  are  pairwise  uncorrelated,  then  the 
off-diagonal  entries  of  the  covariance  matrix  C  are  zero:  C{i,j)  =  0  for  i  ^  j.  Thus,  C  and 
are  both  diagonal  matrices,  with  diagonal  entries  given  by  C{i,i)  =  vf, 

i  =  l,...,m,  and  determinant  |C|  =  Vy-.vf^.  In  this  case,  we  see  that  the  joint  density  (5.58) 
decomposes  into  a  product  of  marginal  densities: 


p{Xi,...,X,n) 


1 

V27rn^ 


{x-i—mm) 


2v 


T' 

m 


2 


p{Xi)...p{Xm) 
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so  that  Xi,  are  independent. 

Recall  that,  while  independent  random  variables  are  uncorrelated,  the  converse  need  not  be  true. 
However,  when  we  put  the  additional  restriction  of  joint  Gaussianity,  uncorrelatedness  does  imply 
independence. 

We  can  now  characterize  the  distribution  of  affine  transformations  of  jointly  Gaussian  random 
variables.  If  X  is  a  Gaussian  random  vector,  then  Y  =  AX  +  b  is  also  Gaussian.  To  see  this, 
note  that  any  linear  combination  of  Yi, ...,  equals  a  linear  combination  of  Xi, ...,  (plus  a 
constant),  which  is  a  Gaussian  random  variable  by  the  Gaussianity  of  X.  Since  Y  is  Gaussian, 
its  distribution  is  completely  characterized  by  its  mean  vector  and  covariance  matrix,  which  we 
have  just  computed.  We  can  now  state  the  following  result. 

Joint  Gaussianity  is  preserved  under  afRne  transformations 

If  X  ~  iV(m,  C),  then  AX  +  b  ~  A(Am  +  b,  ACA^)  (5.61) 


Example  5.6.6  (Computations  with  jointly  Gaussian  random  variables)  As  in  Example 
5.6.4,  consider  two  random  variables  Xi  and  X2  such  that  Xi  has  mean  -1  and  variance  4,  X2 
has  mean  2  and  variance  9,  and  cov(Ai,  A2)  =  —3.  Now  assume  in  addition  that  these  random 
variables  are  jointly  Gaussian. 

(a)  Write  down  the  mean  vector  and  covariance  matrix  for  the  random  vector  Y  =  (Yi,Y2)^, 
where  Yi  =  3Ai  —  A2  +  3  and  Y2  =  Xi  +  X2  —  2. 

(b)  Evaluate  the  probability  P[3Ai  —X2  <  5]  in  terms  of  the  Q  function  with  positive  arguments. 

(c)  Suppose  that  Z  =  aXi  +  X2-  Find  the  constant  a  such  that  Z  is  independent  of  Xi  +  X2. 
Solution  to  (a):  We  have  already  found  the  mean  and  covariance  of  X  in  Example  5.6.4;  they 
are  given  by  (5.57).  Now,  Y  =  AX  +  b,  where 


We  can  now  apply  (5.61)  to  obtain  the  mean  vector  and  covariance  matrix  for  Y; 


my  =  Amx  +  b 


-4 

-1 


Gy  =  ACxX^ 


108  -9  \ 

-97) 


Solution  to  (b):  Since  Yi  =  3Ai  —  A2  +  3  ~  Y(— 4, 108),  the  required  probability  can  be  written 
as 


F[3Ai  -  As  <  5]  =  P[Yi  <  8]  =  $ 


8  -  (-4) 
yios 


=  $ 


=  1-Q 


Solution  to  (c):  Since  Z  =  aXi  +  As  and  Ai  are  jointly  Gaussian,  they  are  independent  if  they 
are  uncorrelated.  The  covariance  is  given  by 

cov(Z,  Ai)  =  cov(aAi  +  As,  Ai)  =  a  cov(Ai,  Ai)  +  cov(A2,  Ai)  =  4a  —  3 

so  that  we  need  a  =  3/4  for  Z  and  Ai  to  be  independent. 


Discrete  time  WGN:  The  noise  model  N  ~  A(0,cr^I)  is  called  discrete  time  white  Gaussian 
noise  (WGN).  The  term  white  refers  to  the  noise  samples  being  uncorrelated  and  having  equal 
variance.  We  will  see  how  such  discrete  time  WGN  arises  from  continuous-time  WGN,  which  we 
discuss  during  our  coverage  of  random  processes  later  in  this  chapter. 
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Example  5.6.7  (Binary  on-off  keying  in  discrete  time  WGN)  Let  us  now  revisit  on-off 
keying,  explored  for  scalar  observations  in  Example  5.6.3,  for  vector  observations.  The  receiver 
processes  a  vector  Y  =  (Yi, ...,  Yn)^  of  samples  modeled  as  follows:  Y  =  s  -|-  N  if  1  is  sent,  and 
Y  =  N  IS  0  IS  SGxitj  wliGrG  s  —  ••*5  tliG  si^riSil^  Sjiici  tliG  rioiSG  ***7 

A^(0,(j^I).  That  is,  the  noise  samples  Ni^...^Nn  are  i.i.d.  N{0,a‘^)  random  variables.  Suppose 
we  use  the  following  correlator-based  decision  statistic: 

n 

Z  =  s^Y  =  J2skYk 

k=l 

Thus,  we  have  reduced  the  vector  observation  to  a  single  number  based  on  which  we  will  make 
our  decision.  The  hypothesis  framework  developed  in  Chapter  6  will  be  used  to  show  that  this 
decision  statistic  is  optimal,  in  a  well-defined  sense.  For  now,  we  simply  accept  it  as  given. 

(a)  Find  the  conditional  distribution  of  Z  given  that  0  is  sent. 

(b)  Find  the  conditional  distribution  of  Z  given  that  1  is  sent. 

(c)  Observe  from  (a)  and  (b)  that  we  are  now  back  to  the  setting  of  Example  5.6.3,  with  Z  now 
playing  the  role  of  Y.  Specify  the  values  of  m  and  and  the  SNR  =  in  terms  of  s  and  a^. 

(d)  As  in  Example  5.6.3,  consider  the  simple  decision  rule  that  1  is  sent  if  Y  >  m/2,  and  say 
that  0  is  sent  if  Y  <  m/2.  Find  the  error  probability  (in  terms  of  the  Q  function)  as  a  function 
of  s  and 

(e)  Evaluate  the  error  probability  for  s  =  (—2,  2, 1)^  and  =  1/4. 

Solution: 

(a)  If  0  is  sent,  then  Y  =  N  =~  Y(0,(T^I).  Applying  (5.61)  with  m  =  0,  A  =  s^,  C  =  a^I,  we 

obtain  Y  =  s^Y  ~  Y(0,  |s|  p). 

(b)  If  1  is  sent,  then  Y  =  s  -|-  N  ~  Y(s,(T^I).  Applying  (5.61)  with  m  =  s,  A  =  s^,  C  =  a^I, 
we  obtain  Y  =  s^Y  ~  A(|  |s|  p,  |s|  p).  Alternatively,  s^Y  =  s^(s -|- N)  =  ||s|p-|-s^N.  Since 
s^N  A(0,  (T^l  |s|  p)  from  (a),  we  simply  translate  the  mean  by  ||s|p. 

(c)  Comparing  with  Example  5.6.3,  we  see  that  m  =  ||s|p,  =  (T^||s|p,  and  SNR  =  ^  = 

(d)  From  Example  5.6.3,  we  know  that  the  decision  rule  that  splits  the  difference  between  the 
means  has  error  probability 


Pe  —  Pe\0  —  Pe\l 


=  Q 


plugging  in  the  expressions  for  m  and  from  (c).  (e)  We  have  ||s|p  =  9.  Using  (d),  we  obtain 
=  Q(3)  =  0.0013. 

Noise  is  termed  colored  when  it  is  not  white;  that  is,  when  the  noise  samples  are  correlated  and/or 
have  different  variances.  We  will  see  later  how  colored  noise  arises  from  linear  transformations 
on  white  noise.  Let  us  continue  our  sequence  of  examples  regarding  on-off  keying,  but  now  with 
colored  noise. 

Example  5.6.8  (Binary  on-off  keying  in  discrete  time  colored  Ganssian  noise)  As  in 

the  previous  example,  we  have  a  vector  observation  Y  =  (Yi, ...,  Y„)^,  with  Y  =  s  -|-  N  if  1  is 
sent,  and  Y  =  N  is  0  is  sent,  where  s  =  (si,  ...,s„)^  is  the  signal.  However,  we  now  allow  the 
noise  covariance  matrix  to  be  arbitrary:  N  =  {Ni, ...,  A„)  iV(0,C^). 

(a)  Consider  the  decision  statistic  Yi  =  s^Y.  Find  the  conditional  distributions  of  Yi  given  0 
sent,  and  given  1  sent. 

(b)  Show  that  Yi  follows  the  scalar  on-off  keying  model  Example  5.6.3,  specifying  the  parameters 

2 

mi  and  vf,  and  SNRi  =  in  terms  of  s  and  Cat. 

(c)  Find  the  error  probability  of  the  simple  decision  rule  comparing  Yi  to  the  threshold  mi/2. 
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(d)  Repeat  (a)-(c)  for  an  decision  statistic  Z2  =  (use  the  notation  m2,  and  SNR2  to 

denote  the  quantities  analogous  to  those  in  (b)). 

(e)  Apply  the  preceding  to  the  following  example:  two-dimensional  observation  Y  =  (Yi,Y2) 
with  s  =  (4,  —2)^  and 


Find  explicit  expressions  for  Zi  and  Z2  in  terms  of  Yi  and  Y2-  Compute  and  compare  the  SNRs 
and  error  probabilities  obtained  with  the  two  decision  statistics. 

Solution:  We  proceed  similarly  to  Example  5.6.7. 

(a)  If  0  is  sent,  then  Y  =  N  =~  N{0,  Cat).  Applying  (5.61)  with  m  =  0,  A  =  s'^,  C  =  Cat,  we 
obtain  Zi  =  s'^Y  ~  A(0,s^C7vs). 

If  1  is  sent,  then  Y  =  s  -|-  N  iV(s,C^).  Applying  (5.61)  with  m  =  s,  A  =  s^,  C  =  Ctv,  we 
obtain  Zi  =  s^Y  ~  A(||s|p,  s^Cats).  Alternatively,  s^Y  =  s'^(s -|- N)  =  | |s| p -|- s'^N.  Since 
s^N  r\j  N{0,s^C]s[s)  from  (a),  we  simply  translate  the  mean  by  ||s|p. 

2 

(b)  Comparing  with  Example  5.6.3,  we  see  that  mi  =  ||s|p,  vf  =  s^Cns,  and  SNRi  =  ^  = 
INP 

2sTCns' 

(c)  From  Example  5.6.3,  we  know  that  the  decision  rule  that  splits  the  difference  between  the 
means  has  error  probability 
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plugging  in  the  expressions  for  mi  and  v\  from  (b). 

(d)  We  now  have  Z2  =  .  If  0  is  sent,  Y  =  N  =~  A(0,  Cat).  Applying  (5.61)  with  m  =  0, 

A  =  C  =  Cat,  we  obtain  Z2  =  s^Y  ~  A(0,  s'^C)^^s). 

If  1  is  sent,  then  Y  =  s  -|-  N  ~  A(s,  Cat).  Applying  (5.61)  with  m  =  s,  A  =  C  =  Cat, 

we  obtain  Z2  =  s'^C^^Y  ~  A(s^C)^^s,  s'^C^^^s).  That  is,  m2  =  =  s^C)^^s,  and 

SNR2  =  ^  =  The  corresponding  error  probability  is  Pe2  =  Q  =  Q  ^ - 

(e)  For  the  given  example,  we  hnd  Zi  =  s'^Y  =  4Yi  —  2Y2  and  Z2  =  s'^C^^^Y  =  |(7Yi  -|-  Y2). 
We  can  see  that  the  relative  weights  of  the  two  observations  are  quite  different  in  the  two  cases. 
Numerical  computations  using  the  Matlab  script  below  yield  SNRs  of  6.2  dB  and  9.4  dB,  and 
error  probabilities  of  0.07  and  0.02  in  the  two  cases,  so  that  Z2  provides  better  performance  than 
Zi.  We  shall  see  in  Chapter  6  that  Z2  is  actually  the  optimal  decision  statistic,  both  in  terms  of 
maximizing  SNR  and  minimizing  error  probability. 


A  Matlab  code  fragment  for  generating  the  numerical  results  in  Example  5.6.8(e)  is  given  below. 

Code  Fragment  5.6.3  (Performance  of  on-off  keying  in  colored  Gaussian  noise) 

yo7oOOK  with  colored  noise:  N(s,C_N)  versus  N(0,C_N) 
s=[4;-2];  '/oSignal 

Cn=[l  -1;-1  4];  y„noise  covariance  matrix 
y„yodecision  statistic  ZI  =  s"?  Y 
ml=  s’*s;  ymean  if  1  sent 

variancel  =s'*Cn*s;  yvariance  under  each  hypothesis 
vl=sqrt (variancel) ;  ^standard  deviation 
SNRI  =  ml''2/(2*variancel) ;  y„SNR 

Pel  =  qfunctionCml/ (2*vl) ) ;  terror  prob  for  "split  the  difference"  rule  using  ZI 
y„yodecision  statistic  Z2  =  s"!  Cn~{-1}  Y 
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m2  =  s’*inv(Cn)*s;  Zmean  if  1  sent 
variance2=s^*inv(Cn)*s;  °/oVariance=mean  in  this  case 
v2=sqrt (variance2) ;  Zstandard  deviation 

SNR2  =  m2~2/(2*variance2) ;  Zreduces  to  SNR2=  m2/2  in  this  case 

Pe2  =  qfunction(m2/(2*v2) ) ;  Zerror  prob  for  "split  the  difference"  rule  using  Z2 
ZCompare  performance  of  the  two  rules 
10*logl0([SNRl  SNR2])  ZSNRs  in  dB 
[Pel  Pe2]  Zerror  probabilities 


5.7  Random  Processes 

A  key  limitation  on  the  performance  of  communication  systems  comes  from  receiver  noise,  which 
is  an  unavoidable  physical  phenomenon  (see  Appendix  5.C).  Noise  cannot  be  modeled  as  a 
deterministic  waveform  (i.e.,  we  do  not  know  what  noise  waveform  we  will  observe  at  any  given 
point  of  time).  Indeed,  neither  can  the  desired  signals  in  a  communication  system,  even  though 
we  have  sometimes  pretended  otherwise  in  prior  chapters.  Information-bearing  signals  such 
as  speech,  audio,  video  are  best  modeled  as  being  randomly  chosen  from  a  vast  ensemble  of 
possibilities.  Similarly,  the  bit  stream  being  transmitted  in  a  digital  communication  system  can 
be  arbitrary,  and  can  therefore  be  thought  of  as  being  randomly  chosen  from  a  large  number  of 
possible  bit  streams.  It  is  time,  therefore,  to  learn  how  to  deal  with  random  processes,  which  is 
the  technical  term  we  use  for  signals  that  are  chosen  randomly  from  an  ensemble,  or  collection,  of 
possible  signals.  A  detailed  investigation  of  random  processes  is  well  beyond  our  scope,  and  our 
goal  here  is  limited  to  developing  a  working  understanding  of  concepts  critical  to  our  study  of 
communication  systems.  We  shall  see  that  this  goal  can  be  achieved  using  elementary  extensions 
of  the  probability  concepts  covered  earlier  in  this  chapter. 


5.7.1  Running  example:  sinusoid  with  random  amplitude  and  phase 

Let  us  work  through  a  simple  example  before  we  embark  on  a  systematic  development.  Suppose 
that  Xi  and  X2  are  i.i.d.  iV(0, 1)  random  variables,  and  dehne 

X (t)  =  Xi  cos  2Tifct  —  X2  sin  Zvr/ct  (5.62) 

where  /c  >  0  is  a  hxed  frequency.  The  waveform  X{t)  is  not  a  deterministic  signal,  since 
Xi  and  X2  can  take  random  values  on  the  real  line.  Indeed,  for  each  time  t,  X{t)  is  a  random 
variable,  since  it  is  a  linear  combination  of  two  random  variables  Xi  and  X2  dehned  on  a  common 
probability  space.  Moreover,  if  we  pick  a  number  of  times  ti,t2,---,  then  the  corresponding 
samples  X{ti),X{t2), ...  are  random  variables  on  a  common  probability  space. 

Another  interpretation  of  X{t)  is  obtained  by  converting  {Xi,X2)  to  polar  form: 

Xi  =  A  cos  0  ,  X2  =  A  sin  0 

For  Xi,X2  i.i.d.  iV(0,l),  we  know  from  Problem  5.21  that  A  is  Rayleigh,  0  is  uniform  over 
[0,27r],  and  A,  0  are  independent.  The  random  process  X{t)  can  be  rewritten  as 

X{t)  =  A  cos  0  cos  27r  fct  —  A  sin  0  sin  271  f  A  =  A  cos(27r/cf  -|-  0)  (5.63) 

Thus,  X{t)  is  a  sinusoid  with  random  amplitude  and  phase. 

For  a  given  time  t,  what  is  the  distribution  of  X{t)7  Since  X{t)  is  a  linear  combination  of  i.i.d. 
Gaussian,  hence  jointly  Gaussian,  random  variables  Xi  and  X2,  we  infer  that  it  is  a  Gaussian 


224 


random  variable.  Its  distribution  is  therefore  specified  by  computing  its  mean  and  variance,  as 
follows: 

E  [X (f)]  =  E[Xi]  cos  271  fct  —  E[X2]  sin  27ifct  =  0  (5.64) 

var  (X (t))  =  cov  (Xi  cos  27r/ct  —  X2  sin  27i  fct,  Xi  cos  27i  fct  —  X2  sin  27i  fct) 

=  cov(Xi,  Xi)  cos^  2nfct  +  cov(X2,  X2)  sin^  27i fct  —  2cov(Xi,  X2)  cos27r/cf  sin  27ifct  (5.65) 
=  cos^  277  fct  +  sin^  277  fct  =  1 

using  cov(Xj,Xj)  =  var(Xj)  =  1,  i  =  1,2,  and  cov(Xi,X2)  =  0  (since  Xi,  X2  are  independent). 
Thus,  we  have  X{f)  ~  X(0, 1)  for  any  t. 

In  this  particular  example,  we  can  also  easily  specify  the  joint  distribution  of  any  set  of  n  samples, 
X(fi),  ...,X{tn),  where  n  can  be  arbitrarily  chosen.  The  samples  are  jointly  Gaussian,  since  they 
are  linear  combinations  of  the  jointly  Gaussian  random  variables  Xi,X2.  Thus,  we  only  need  to 
specify  their  means  and  pairwise  covariances.  We  have  just  shown  that  the  means  are  zero,  and 
that  the  diagonal  entries  of  the  covariance  matrix  are  one.  More  generally,  the  covariance  of  any 
two  samples  can  be  computed  as  follows: 

cov  {X{ti),  Xifj))  =  cov  (Xi  cos27r/ctj  —  X2  sin  27r/cti,  Xi  cos27r/ctj  —  X2  sin27r/cfj) 

=  COv(Xi,  Xi)  cos  277 fcU  cos  277  fct j  +  COv(X2,  X2)  siu  277  fcU  siu  277  fct j 

—  2cov(Xi,  X2)  cos  277fcti  sin  277 fct  j  (5.66) 

=  cos  277fcti  COS  277 fct  j  +  siu  277  fct  i  siu  277 fct  j 
=  COS  277  fc{ti  -  tj) 


While  we  have  so  far  discussed  the  random  process  X{t)  from  a  statistical  point  of  view,  for 
hxed  values  of  Xi  and  X2,  we  see  that  X{t)  is  actually  a  deterministic  signal.  Specihcally,  if  the 
random  vector  (Xi,  X2)  is  dehned  over  a  probability  space  G,  a  particular  outcome  ca  G  G  maps 
to  a  particular  realization  (Xi(a;),  X2(a;)).  This  in  turn  maps  to  a  deterministic  “realization,”  or 
“sample  path,”  of  X(f),  which  we  denote  as  X(t,  ca): 

X{t,uj)  =  Xi(a;)  cos 277 fct  —  X2{uj)  sm277 fct 

To  see  what  these  sample  paths  look  like,  it  is  easiest  to  refer  to  the  polar  form  (5.63): 

X{t,u)  =  A{u)  cos  {277 fct  +  0(n;)) 

Thus,  as  shown  in  Figure  5.16,  different  sample  paths  have  different  amplitudes,  drawn  from  a 
Rayleigh  distribution,  along  with  phase  shifts  drawn  from  a  uniform  distribution. 


5.7.2  Basic  definitions 

As  we  have  seen  earlier,  a  random  vector  X  =  (Xi,...,X„)^  is  a  hnite  collection  of  random 
variables  dehned  on  a  common  probability  space,  as  depicted  in  Figure  5.7.  A  random  process 
is  simply  a  generalization  of  this  concept,  where  the  number  of  such  random  variables  can  be 
inhnite. 

Random  process:  A  random  process  X  is  a  collection  of  random  variables  {X{t),t  G  T},  where 
the  index  set  T  can  be  hnite,  countably  inhnite,  or  uncountably  inhnite.  When  we  interpret  the 
index  set  as  denoting  time,  as  we  often  do  for  the  scenarios  of  interest  to  us,  a  countable  index 
set  corresponds  to  a  discrete  time  random  process,  and  an  uncountable  index  set  corresponds  to 
a  continuous  time  random  process.  We  denote  by  X{t,  u)  the  value  taken  by  the  random  variable 
X{t)  for  any  given  outcome  u  in  the  sample  space. 
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Figure  5.16:  Two  sample  paths  for  a  sinusoid  with  random  amplitude  and  phase. 


For  the  sinusoid  with  random  amplitude  and  phase,  the  sample  space  only  needs  to  be  rich 
enough  to  support  the  two  random  variables  Xi  and  X2  (or  A  and  0),  from  which  we  can  create 
a  continuum  of  random  variables  X{t,uj),  —00  <  t  <  00: 

U  — >■  {Xi(uj),  X2{uj))  — )■  X{t,uj) 

In  general,  however,  the  source  of  randomness  can  be  mnch  richer.  Noise  in  a  receiver  circuit  is 
caused  by  random  motion  of  a  large  nnmber  of  charge  carriers.  A  digitally  modulated  waveform 
depends  on  a  seqnence  of  randomly  chosen  bits.  The  preceding  conceptual  framework  is  general 
enough  to  cover  all  such  scenarios. 

Sample  paths:  We  can  also  interpret  a  random  process  as  a  signal  drawn  at  random  from  an 
ensemble,  or  collection,  of  possible  signals.  The  signal  we  get  at  a  particular  random  draw  is 
called  a  sample  path,  or  realization,  of  the  random  process.  Once  we  £x  a  sample  path,  it  can  be 
treated  like  a  deterministic  signal.  Specihcally,  for  each  hxed  ontcome  cu  G  fl,  the  sample  path 
is  X{t,u),  which  varies  only  with  t.  We  have  already  seen  examples  of  samples  paths  for  our 
running  example  in  Figure  5.16. 

Finite-dimensional  distribntions:  As  indicated  in  Figure  5.17,  the  samples  X{ti), X{tn) 
from  a  random  process  X  are  mappings  from  a  common  sample  space  to  the  real  line,  with 
X [ti,  u)  denoting  the  value  of  the  random  variable  X (tj)  for  outcome  ca  G  hi.  The  joint  distribution 
of  these  random  variables  depends  on  the  nnderlying  probability  measure  on  the  sample  space 
n.  We  say  that  we  “know”  the  statistics  of  a  random  process  if  we  know  the  joint  statistics  of 
an  arbitrarily  chosen  hnite  collection  of  samples.  That  is,  we  know  the  joint  distribntion  of  the 
samples  X{ti),  ...,X(tn),  regardless  of  the  number  of  samples  n,  and  the  sampling  times  ti,  ...,tn- 
These  joint  distributions  are  called  the  finite- dimensional  distributions  of  the  random  process, 
with  the  joint  distribntion  of  n  samples  called  an  nth  order  distribntion.  Thus,  while  a  random 
process  may  be  comprised  of  inhnitely  many  random  variables,  when  we  specify  its  statistics,  we 
focus  on  a  hnite  subset  of  these  random  variables. 

For  our  running  example  (5.62),  we  observed  that  the  samples  are  jointly  Gaussian,  and  specihed 
the  joint  distribution  by  compnting  the  means  and  covariances.  This  is  a  special  case  of  a  broader 
class  of  Gaussian  random  processes  (to  be  dehned  shortly)  for  which  it  is  possible  to  characterize 
hnite-dimensional  distributions  compactly  in  this  fashion.  Often,  however,  it  is  not  possible  to 
explicitly  specify  such  distributions,  but  we  can  still  compute  useful  quantities  averaged  across 
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X(ti) 


X(t2)  \ 

\  X(ti,co) 


Sample  space  £2  \  X  (t2  ,  CO) 

X(tn) 


X(t„,co) 

Figure  5.17:  Samples  of  a  random  process  are  random  variables  defined  on  a  common  probability 
space. 


sample  paths. 

Ensemble  averages:  Knowing  the  hnite-dimensional  distributions  enables  us  to  compute  sta¬ 
tistical  averages  across  the  collection,  or  ensemble,  of  sample  paths.  Such  averages  are  called 
ensemble  averages.  We  will  be  mainly  interested  in  “second  order”  statistics  (involving  expecta¬ 
tions  of  products  of  at  most  two  random  variables),  such  as  means  and  covariances.  We  dehne 
these  quantities  in  sufficient  generality  that  they  apply  to  complex-valued  random  processes,  but 
specialize  to  real-valued  random  processes  in  most  of  our  computations. 


5.7.3  Second  order  statistics 

Mean,  antocorrelation,  and  autocovariance  functions  (ensemble  averages):  For  a  ran¬ 
dom  process  X{t),  the  mean  function  is  dehned  as 

mx{t)  =E[X{t)]  (5.67) 


and  the  autocorrelation  function  as 

Rx{h,h)=E[X{t,)X*{t2)]  (5.68) 

Note  that  Rx{t,  t)  =  E[|X(f)p]  is  the  instantaneous  power  at  time  t.  The  autocovariance  function 
of  X  is  the  autocorrelation  function  of  the  zero  mean  version  of  X,  and  is  given  by 

Cx{h,t2)  =  E[(X(fi)  -  E[W(fi)])(X(t2)  -  E[X(f2)])*]  =  Rxih,  h)  -  mx{t{)m\{t2)  (5.69) 

Second  order  statistics  for  running  example:  We  have  from  (5.64)  and  (5.66)  that 

mx{t)=Q,  Cx{ti,t2)  =  Rx{ti,t2)  =  cos27rfc{ti-t2)  (5.70) 

It  is  interesting  to  note  that  the  mean  function  does  not  depend  on  t,  and  that  the  autocorrelation 
and  autocovariance  functions  depend  only  on  the  difference  of  the  times  ti  —  ^2-  This  implies 
that  if  we  shift  X (t)  by  some  time  delay  d,  the  shifted  process  X{t)  =  X{t  —  d)  would  have  the 
same  mean  and  autocorrelation  functions.  Such  translation  invariance  of  statistics  is  interesting 
and  important  enough  to  merit  a  formal  dehnition,  which  we  provide  next. 
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5.7.4  Wide  Sense  Stationarity  and  Stationarity 

Wide  sense  stationary  (WSS)  random  process:  A  random  process  X  is  said  to  be  WSS  if 

mxit)  =  mx(0)  for  all  t 

and 

Rx{ti,  ^2)  =  Rx{ti  -  ^2,  0)  for  all  ti,t2 

In  this  case,  we  change  notation,  dropping  the  time  dependence  in  the  notation  for  the  mean 
mx,  and  expressing  the  autocorrelation  function  as  a  function  of  r  =  —  ^2  alone.  Thus,  for  a 

WSS  process,  we  can  dehne  the  autocorrelation  function  as 

Rx{t)  =  E[X{t)X*{t-T)]  for  a:  WSS  (5.71) 

with  the  understanding  that  the  expectation  is  independent  of  t.  Since  the  mean  is  independent 
of  time  and  the  autocorrelation  depends  only  on  time  differences,  the  autocovariance  also  depends 
only  on  time  differences,  and  is  given  by 

Cx{t)  =  Rx{t) -\mxf  for  X  WSS  (5.72) 

Second  order  statistics  for  running  example  (new  notation):  With  this  new  notation, 
we  have 

mx  =  0  ,  Rx{r)  =  Cx{t)  =  cos  2%  f^r  (5.73) 

A  WSS  random  process  has  shift-invariant  second  order  statistics.  An  even  stronger  notion  of 
shift-invariance  is  stationarity. 

Stationary  random  process:  A  random  process  X (t)  is  said  to  be  stationary  if  it  is  statistically 
indistinguishable  from  a  delayed  version  of  itself.  That  is,  X{t)  and  X{t  —  d)  have  the  same 
statistics  for  any  delay  d  G  (— cxo,  cx)). 

Running  example:  The  sinusoid  with  random  amplitude  and  phase  in  our  running  example  is 
stationary.  To  see  this,  it  is  convenient  to  consider  the  polar  form  in  (5.63):  X{t)  =  Acos{27rfct  + 
0),  where  0  is  uniformly  distributed  over  [0,27r].  Note  that 

Y{t)  =  X{t  —  d)  =  A  cos(27r/c(t  —  d)  +  Q)  =  A  cos{2tt  fct  +  0') 

where  0'  =  0  —  27r fed  modulo  27r  is  uniformly  distributed  over  [0,27r].  Thus,  X  and  Y  are 
statistically  indistinguishable. 

Stationarity  implies  wide  sense  stationarity:  For  a  stationary  random  process  X,  the  mean 
function  satishes 

mx{t)  =  mxit-  d) 

for  any  t,  regardless  of  the  value  of  d.  Choosing  d  =  t,  we  infer  that 

mxit)=mxi0)  (5.74) 

That  is,  the  mean  function  is  a  constant.  Similarly,  the  autocorrelation  function  satisfies 

Rxih,  ^2)  =  Rxih  —  d,t2  —  d) 

for  any  ^1,^2,  regardless  of  the  value  of  d.  Setting  d  =  t2,  we  have  that 

Rxiti,t2)  =  Rxiti  —  t2,0)  (5.75) 

Thus,  a  stationary  process  is  also  WSS. 

While  our  running  example  was  easy  to  analyze,  in  general,  stationarity  is  a  stringent  requirement 
that  is  not  easy  to  verify.  For  our  needs,  the  weaker  concept  of  wide  sense  stationarity  typically 
suffices.  Further,  we  are  often  interested  in  Gaussian  random  processes  (dehned  shortly),  for 
which  wide  sense  stationarity  actually  implies  stationarity. 
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5.7.5 


Power  Spectral  Density 
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Figure  5.18:  Operational  definition  of  PSD  for  a  sample  path  x{t). 


We  have  dehned  the  concept  of  power  spectral  density  (PSD),  which  specihes  how  the  power  in 
a  signal  is  distributed  in  different  frequency  bands,  for  deterministic  signals  in  Chapter  2.  This 
deterministic  framework  directly  applies  to  a  given  sample  path  of  a  random  process,  and  indeed, 
this  is  what  we  did  when  we  computed  the  PSD  of  digitally  modulated  signals  in  Chapter  4. 
While  we  did  not  mention  the  term  “random  process”  then  (for  the  good  reason  that  we  had 
not  introduced  it  yet),  if  we  model  the  information  encoded  into  a  digitally  modulated  signal  as 
random,  then  the  latter  is  indeed  a  random  process.  Let  us  now  begin  by  restating  the  dehnition 
of  PSD  in  Chapter  2. 


Power  Spectral  Density:  The  power  spectral  density  (PSD),  Sx{f),  for  a  hnite-power  signal 
x{t),  which  we  can  now  think  of  as  a  sample  path  of  a  random  process,  is  dehned  through  the 
conceptual  measurement  depicted  in  Figure  5.18.  Pass  x{t)  through  an  ideal  narrowband  hlter 
with  transfer  function 


1,  <  f  <u+^ 

0,  else 


The  PSD  evaluated  at  z/,  Sx{i^),  is  dehned  as  the  measured  power  at  the  hlter  output,  divided 
by  the  hlter  width  A/  (in  the  limit  as  Af  0). 

The  power  meter  in  Figure  5.18  is  averaging  over  time  to  estimate  the  power  in  a  frequency  slice 
of  a  particular  sample  path.  Let  us  review  how  this  is  done  before  discussing  how  to  average 
across  sample  paths  to  dehne  PSD  in  terms  of  an  ensemble  average. 

Periodogram-based  PSD  estimation:  The  PSD  can  be  estimated  by  computing  Fourier 
transform  over  a  hnite  observation  interval,  and  dividing  its  magnitude  squared  (which  is  the 
energy  spectral  density)  by  the  length  of  the  observation  interval.  The  time-windowed  version  of 
X  is  dehned  as 

Toft)  (5.76) 

where  To  is  the  length  of  the  observation  interval.  The  Fourier  transform  of  XT„{t)  is  denoted  as 


XTM)=nXTj 


The  energy  spectral  density  of  xto  is  therefore  |At'„(/)P,  and  the  PSD  estimate  is  given  by 


SAf) 


\XTo{f)f 

To 


(5.77) 


PSD  for  a  sample  path:  Formally,  we  dehne  the  PSD  for  a  sample  path  in  the  limit  of  large 
time  windows  as  follows: 

\Xt  (f)P 

Sx{f)  =  lim  - ^ -  PSD  for  sample  path  (5.78) 

To^oo  To 
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The  preceding  definition  involves  time  averaging  across  a  sample  path,  and  can  be  related  to  the 
time-averaged  antocorrelation  fnnction,  defined  as  follows. 

Time-averaged  autocorrelation  function  for  a  sample  path:  For  a  sample  path  x{t),  we 
define  the  time-averaged  antocorrelation  fnnction  as 


Rx{t) 


x{t)x*{t  —  t) 


1 

lim  —  /  x(t)x*  (t  —  t)  dt 

To^oo  To  J_^  w  V 


We  now  state  the  following  important  result. 

Time-averaged  PSD  and  autocorrelation  function  form  a  Fourier  transform  pair. 

Sx{f)^Rx{T)  (5.79) 

We  omit  the  proof,  but  the  result  can  be  derived  using  the  techniques  of  Chapter  2. 

Time-averaged  PSD  and  autocorrelation  function  for  running  example:  For  our  ran¬ 
dom  sinusoid  (5.63),  the  time  averaged  autocorrelation  function  is  given  by 

Rx{t)  =  A  cos{27rf ct  +  0)A  cos(27r/c(t  —  r)  +  Q) 

=  ^cos27r/cr  -f  cos(47r/ct  —  27r/cr  -|-  20)  (5.80) 

=  ^  cos  271  fcT 

The  time  averaged  PSD  is  given  by 

SM  =  -  fr)  +  +  /o)  (5.81) 


We  now  extend  the  concept  of  PSD  to  a  statistical  average  as  follows. 


Ensemble-averaged  PSD:  The  ensemble-averaged  PSD  for  a  random  process  is  defined  as 
follows: 


Sxif) 


lim  E 

Tq—^OO 


'i^T„(/)r 

To 


ensemble  averaged  PSD 


(5.82) 


That  is,  we  take  the  expectations  of  the  PSD  estimates  computed  over  an  observation  interval, 
and  then  let  the  observation  interval  get  large. 

Potential  notational  confusion:  We  use  capital  letters  (e.g.,  X{t))  to  denote  a  random  process 
and  small  letters  (e.g.,  x(t))  to  denote  sample  paths.  However,  we  also  use  capital  letters  to 
denote  the  Fourier  transform  of  a  time  domain  signal  (e.g.,  s(t)  -H-  *S'(/)),  as  introduced  in 
Chapter  2.  Rather  than  introducing  additional  notation  to  resolve  this  potential  ambiguity,  we 
rely  on  context  to  clarify  the  situation.  In  particular  (5.82)  illustrates  this  potential  problem.  On 
the  left-hand  side,  we  use  X  to  denote  the  random  process  whose  PSD  Sx{f)  we  are  interested  in. 
On  the  right-hand  side,  we  use  Xj'^(/)  to  denote  the  Fourier  transform  of  a  windowed  sample  path 
XToit).  Such  opportunities  for  confusion  arise  seldom  enough  that  it  is  not  worth  complicating 
our  notation  to  avoid  them. 


A  result  analogous  to  (5.79)  holds  for  ensemble-averaged  quantities  as  well. 

Ensemble-averaged  PSD  and  autocorrelation  function  for  WSS  processes  form  a 
Fourier  transform  pair  (Wiener-Khintchine  theorem).  For  a  WSS  process  X  with  auto¬ 
correlation  function  Rx{t),  the  ensemble  averaged  PSD  is  the  Fourier  transform  of  the  ensemble- 
averaged  autocorrelation  function: 


Sx{f)  =  R{Rx{r)) 


Rx{r)e-^^^R  dr 


(5.83) 
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This  result  is  called  the  Wiener-Khintchine  theorem,  and  can  be  proved  under  mild  conditions 
on  the  autocorrelation  function  (the  area  under  \Rx{t)  \  must  be  hnite  and  its  Fourier  transform 
must  exist).  The  proof  requires  advanced  probability  concepts  beyond  our  scope  here,  and  is 
omitted. 

Ensemble-averaged  PSD  for  running  example:  For  our  running  example,  the  PSD  is 
obtained  by  taking  the  Fourier  transform  of  (5.73): 

Sx(J)  =  i<5(/  -  /.)  +  i<5(/  +  fc)  (5.84) 

That  is,  the  power  in  X  is  concentrated  at  ±/c,  as  we  would  expect  for  a  sinusoidal  signal  at 
frequency  fc- 

Power:  It  follows  from  the  Wiener-Khintchine  theorem  that  the  power  of  X  can  be  obtained 
either  by  integrating  the  PSD  or  evaluating  the  autocorrelation  function  at  r  =  0: 

/CX> 

Sx{f)df  (5.85) 

•oo 

For  our  running  example,  we  obtain  from  (5.73)  or  (5.84)  that  Px  =  1. 

Ensemble  versus  Time  Averages:  For  our  running  example,  we  computed  the  ensemble- 
averaged  autocorrelation  function  Rx{t)  and  then  used  the  Wiener-Khintchine  theorem  to  com¬ 
pute  the  PSD  by  taking  the  Fourier  transform.  At  other  times,  it  is  convenient  to  apply  the 
operational  dehnition  depicted  in  Figure  5.18,  which  involves  averaging  across  time  for  a  given 
sample  path.  If  the  two  approaches  give  the  same  answer,  then  the  random  process  is  said  to  be 
ergodic  in  PSD.  In  practical  terms,  ergodicity  means  that  designs  based  on  statistical  averages 
across  sample  paths  can  be  expected  to  apply  to  individual  sample  paths,  and  that  measure¬ 
ments  carried  out  on  a  particular  sample  path  can  serve  as  a  proxy  for  statistical  averaging 
across  multiple  realizations. 

Comparing  (5.81)  and  (5.84),  we  see  that  our  running  example  is  actually  not  ergodic  in  PSD. 
For  any  sample  path  x{t)  =  Acos(27r/ct  +  0),  it  is  quite  easy  to  show  that 

s.(/)  =  -  fc)  +  +  fc)  (5.86) 

Comparing  with  (5.84),  we  see  that  the  time-averaged  PSD  varies  across  sample  paths  due  to 
amplitude  variations,  with  replaced  by  its  expectation  in  the  ensemble-averaged  PSD. 

Intuitively  speaking,  ergodicity  requires  sufficient  richness  of  variation  across  time  and  sample 
paths.  While  this  is  not  present  in  our  simple  running  example  (a  randomly  chosen  amplitude 
which  is  hxed  across  the  entire  sample  path  is  the  culprit),  it  is  often  present  in  the  more 
complicated  random  processes  of  interest  to  us,  including  receiver  noise  and  digitally  modulated 
signals  (under  appropriate  conditions  on  the  transmitted  symbol  sequences).  When  ergodicity 
holds,  we  have  our  choice  of  using  either  time  averaging  or  ensemble  averaging  for  computations, 
depending  on  which  is  most  convenient  or  insightful. 

The  autocorrelation  function  and  PSD  must  satisfy  the  following  structural  properties  (these 
apply  to  ensemble  averages  for  WSS  processes,  as  well  as  to  time  averages,  although  our  notation 
corresponds  to  ensemble  averages). 

Structural  properties  of  PSD  and  autocorrelation  function 

(PI)  Sxif)  >  0  for  all  /. 

This  follows  from  the  sample  path  based  dehnition  in  Figure  5.18,  since  the  output  of  the  power 
meter  is  always  nonnegative.  Averaging  across  sample  paths  preserves  this  property. 
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(P2a)  The  autocorrelation  function  is  conjugate  symmetric:  Rxij)  =  R*x{—t). 

This  follows  quite  easily  from  the  dehnition  (5.71).  By  setting  t  =  u  +  t,  we  have 

Rx{t)  =  E[X{u  +  t)X*{u)]  =  {E[X{u)X*{u  +  t)])*  =  R*x{-r) 

(P2b)  For  real-valued  X,  both  the  autocorrelation  function  and  PSD  are  symmetric  and  real¬ 
valued.  Sxif)  =  Sx{-f)  and  Rx{t)  =  Rx{-r). 

(This  is  left  as  an  exercise.) 

Any  function  g^r)  -H-  G{f)  must  satisfy  these  properties  in  order  to  be  a  valid  autocorrelation 
function/ PSD. 

Example  5.7.1  (Which  function  is  an  autocorrelation?)  For  each  of  the  following  func¬ 
tions,  determine  whether  it  is  a  valid  autocorrelation  function. 

(a)  giir)  =  sin(r),  (b)  ^2(r)  =  /[-i,i](r),  (c)  gsir)  = 

Solution 

(a)  This  is  not  a  valid  autocorrelation  function,  since  it  is  not  symmetric  and  violates  property 
(P2b). 

(b)  This  satishes  Property  (P2b).  However,  /[_i^i](r)  -H-  2sinc(2/),  so  that  Property  (PI)  is 
violated,  since  the  sine  function  can  take  negative  values.  Hence,  the  boxcar  function  cannot  be 
a  valid  autocorrelation  function.  This  example  shows  that  non-negativity  Property  PI  places  a 
stronger  constraint  on  the  validity  of  a  proposed  function  as  an  autocorrelation  function  than 
the  symmetry  Property  P2. 

(c)  The  function  (73  (r)  is  symmetric  and  satishes  Property  (P2b).  It  is  left  as  an  exercise  to  check 
that  Gsi^f)  >  0,  hence  Property  (PI)  is  also  satished. 

Units  for  PSD:  Power  per  unit  frequency  has  the  same  units  as  power  multiplied  by  time,  or 
energy.  Thus,  the  PSD  is  expressed  in  units  of  Watts/Hertz,  or  Joules. 


x(t)  - 

real-valued 


real-valued  impulse  response 


Figure  5.19:  Operational  dehnition  of  one-sided  PSD. 


One-sided  PSD:  The  PSD  that  we  have  talked  about  so  far  is  the  two-sided  PSD,  which  spans 
both  positive  and  negative  frequencies.  For  a  real- valued  X,  we  can  restrict  attention  to  positive 
frequencies  alone  in  dehning  the  PSD,  by  virtue  of  property  (P2b).  This  yields  the  one-sided 
PSD  A+(/),  dehned  as 

SJ(/)  =  Sv(/)  +  Sx(-/)  =  2Sx(/)  ,  />0,  (X(t)real)  (5.87) 

It  is  useful  to  interpret  this  in  terms  of  the  sample  path  based  operational  dehnition  shown  in 
Figure  5.19.  The  signal  is  passed  through  a  physically  realizable  hlter  (i.e.,  with  real- valued 
impulse  response)  of  bandwidth  A/,  centered  around  u.  The  hlter  transfer  function  must  be 
conjugate  symmetric,  hence 

C  1,  <v  +  ^ 

1,  -!'-%£  </<-!'  +  %£ 

0,  else 
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The  one-sided  PSD  is  defined  as  the  limit  of  the  power  of  the  filter  output,  divided  by  A/,  as 
A/  — )■  0.  Comparing  Figures  5.18  and  5.19,  we  have  that  the  sample  path  based  one-sided  PSD 
is  simply  twice  the  two-sided  PSD:  S'+(/)  =  {S^{f)  +  S^{-f))  /{/>o}  =  2S'„(/)/{/>o}. 

One-sided  PSD  for  running  example:  From  (5.84),  we  obtain  that 

SJ(/)  =«(/-/=)  (5.88) 

with  all  the  power  concentrated  at  /c,  as  expected. 

Power  in  terms  of  PSD:  We  can  express  the  power  of  a  real- valued  random  process  in  terms 
of  either  the  one-sided  or  two-sided  PSD; 

/CO  poo 

Sx{f)df=  (for  X  real)  /  5+(/)d/  (5.89) 

-oo  J  0 

Baseband  and  passband  random  processes:  A  random  process  X  is  baseband  if  its  PSD 
is  baseband,  and  is  passband  if  its  PSD  is  passband.  Thinking  in  terms  of  time  averaged  PSDs, 
which  are  based  on  the  Fourier  transform  of  time  windowed  sample  paths,  we  see  that  a  random 
process  is  baseband  if  its  sample  paths,  time  windowed  over  a  large  enough  observation  interval, 
are  (approximately)  baseband.  Similarly,  a  random  process  is  passband  if  its  sample  paths, 
time  windowed  over  a  large  enough  observation  interval,  are  (approximately)  passband.  The 
caveat  of  “large  enough  observation  interval”  is  inserted  because  of  the  following  consideration: 
timelimited  signals  cannot  be  strictly  bandlimited,  but  as  long  as  the  observation  interval  is  large 
enough,  the  time  windowing  (which  corresponds  to  convolving  the  spectrum  with  a  sine  function) 
does  not  spread  out  the  spectrum  of  the  signal  significantly.  Thus,  the  PSD  (which  is  obtained 
taking  the  limit  of  large  observation  intervals)  also  defines  the  frequency  occupancy  of  the  sample 
paths  over  large  enough  observation  intervals.  Note  that  these  intuitions,  while  based  on  time 
averaged  PSDs,  also  apply  when  bandwidth  occupancy  is  defined  in  terms  of  ensemble-averaged 
PSDs,  as  long  as  the  random  process  is  ergodic  in  PSD. 


Message  PSD 


PSDofDSB-SC  signal 


Figure  5.20:  The  relation  between  the  PSDs  of  a  message  and  the  corresponding  DSB-SC  signal. 


Example  (PSD  of  a  modulated  passband  signal):  Consider  a  passband  signal  Up{t)  = 
m{t)  cos27r/ot,  where  m{t)  is  a  message  modeled  as  a  baseband  random  process  with  PSD  Sm{f) 
and  power  Pm-  Timelimiting  to  an  interval  of  length  To  and  going  to  the  frequency  domain,  we 
have 

Up,tM)  =  ^  (MtM  -  fo)  +  MtM  -  /o))  (5.90) 

Taking  the  magnitude  squared,  dividing  by  To,  and  letting  To  get  large,  we  obtain 

Suj  f)  =  \  iS„{  f  -  /„)  +  S„(/  +  /„))  (5.91) 


An  example  is  shown  in  Figure  5.20. 
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Thus,  we  start  with  the  formula  (5.90)  relating  the  Fourier  transform  for  a  given  sample  path, 
which  is  identical  to  what  we  had  in  Chapter  2  (except  that  we  now  need  to  time  limit  the  finite 
power  message  to  obtain  a  hnite  energy  signal),  and  obtain  the  relation  (5.91)  relating  the  PSDs. 
An  example  is  shown  in  Figure  5.20.  We  can  now  integrate  the  PSDs  to  get 

1  P 

P  =  -  iP  ^  p  )  =  — 

^  u  ^  m  ^  mj  2 


5.7.6  Gaussian  random  processes 

Gaussian  random  processes  are  just  generalizations  of  Gaussian  random  vectors  to  an  arbitrary 
number  of  components  (countable  or  uncountable). 

Gaussian  random  process:  A  random  process  X  =  {X{t),t  ^T}  is  said  to  be  Gaussian  if 
any  linear  combination  of  samples  is  a  Gaussian  random  variable.  That  is,  for  any  number  n  of 
samples,  any  sampling  times  ti,  and  any  scalar  constants  oi, ...,  a^,  the  linear  combination 

aiX(ti)  + ...  +  anX(tn)  is  a  Gaussian  random  variable.  Equivalently,  the  samples  X(ti),  ...,X(t„) 
are  jointly  Gaussian. 

Our  running  example  (5.62)  is  a  Gaussian  random  process,  since  any  linear  combination  of 
samples  is  a  linear  combination  of  the  jointly  Gaussian  random  variables  Xi  and  X2,  and  is 
therefore  a  Gaussian  random  variable. 

A  linear  combination  of  samples  from  a  Gaussian  random  process  is  completely  characterized  by 
its  mean  and  variance.  To  compute  the  latter  quantities  for  an  arbitrary  linear  combination,  we 
can  show,  as  we  did  for  random  vectors,  that  all  we  need  to  know  are  the  mean  function  (analogous 
to  the  mean  vector)  and  the  autocovariance  function  (analogous  to  the  covariance  matrix)  of  the 
random  process.  These  functions  therefore  provide  a  complete  statistical  characterization  of  a 
Gaussian  random  process,  since  the  definition  of  a  Gaussian  random  process  requires  only  that 
we  be  able  to  characterize  the  distribution  of  an  arbitrary  linear  combination  of  samples. 

Characterizing  a  Gaussian  random  process:  The  statistics  of  a  Gaussian  random  process 
are  completely  specihed  by  its  mean  function  mxif)  =  IE[X(t)  and  its  autocovariance  function 
Cx(ti,t2)  =  E[X(ti)X(t2)]-  Given  the  mean  function,  the  autocorrelation  function  Rx(ti,t2)  = 
E[X(ti)X(f2)]  can  be  computed  from  Cx{ti,t2),  and  vice  versa,  using  the  following  relation; 

Rx{ti,t2)  =  CxihRi)  +  mx{ti)mx{t2)  (5.92) 

It  therefore  also  follows  that  a  Gaussian  random  process  is  completely  specified  by  its  mean  and 
autocorrelation  functions. 

WSS  Gaussian  random  processes  are  stationary:  We  know  that  a  stationary  random 
process  is  WSS.  The  converse  is  not  true  in  general,  but  Gaussian  WSS  processes  are  indeed 
stationary.  This  is  because  the  statistics  of  a  Gaussian  random  process  are  characterized  by  its 
first  and  second  order  statistics,  and  if  these  are  shift-invariant  (as  they  are  for  WSS  processes), 
the  random  process  is  statistically  indistinguishable  under  a  time  shift. 


Example  5.7.2  Suppose  that  E  is  a  Gaussian  random  process  with  mean  function  my(t)  =  3t 
and  autocorrelation  function  i?y(ti,t2)  =  -|-  9tit2- 

(a)  Find  the  probability  that  E(2)  is  bigger  than  10. 

(b)  Specify  the  joint  distribution  of  Y(2)  and  E(3). 

(c)  True  or  False  Y  is  stationary. 

(d)  True  or  False  The  random  process  Z{t)  =  Y{t)  —  3t  is  stationary. 

Solution:  (a)  Since  T  is  a  Gaussian  random  process,  the  sample  Y (2)  is  a  Gaussian  random 


234 


variable  with  mean  mY{2)  =  6  and  variance  C'y(2,  2)  =  Ry{‘2',  2)  —  (my(2))^  =  4.  More  generally, 
note  that  the  autocovariance  function  of  Y  is  given  by 

Cy(^i,^2)  =  Ryiti^h)  -  mY{ti)mY{t2)  =  +  9fif2  -  (3ti)(3f2)  = 

so  that  var(y  (t))  =  CY{t,  t)  =  4  for  any  sampling  time  t. 

We  have  shown  that  Y{2)  ~  A^(6,4),  so  that 

P[Y(2)  >  10|  =  Q  =  Q(2) 


(b)  Since  y  is  a  Gaussian  random  process,  Y (2)  and  Y (3)  are  jointly  Gaussian,  with  distribution 
specihed  by  the  mean  vector  and  covariance  matrix  given  by 


m  = 


my(2) 

my(3) 


6 

9 


/  Gy  (2,  2)  Gy  (2,  3)  \  ^  4  \ 

V  ^^>^(3,2)  Gy  (3, 3)  )  V  4  ) 

(c)  Y  has  time-varying  mean,  and  hence  is  not  WSS.  This  implies  it  is  not  stationary.  The 
statement  is  therefore  False. 

(d)  Z[t)  =  Y{t)  —  3t  =  Y{t)  —  mY{t)  is  zero  mean  version  of  Y.  It  inherits  the  Gaussianity  of 
Y.  The  mean  function  mz{t)  =  0  and  the  autocorrelation  function,  given  by 


^z(tlG2)  =  E[(y(ti)  -my(ti))  (y(t2)  -my(t2))]  =  Gy(ti,t2)  =  4e  *4 


depends  on  the  time  difference  ti  —  t2  alone.  Thus,  Z  is  WSS.  Since  it  also  Gaussian,  this  implies 
that  Z  is  stationary.  The  statement  is  therefore  True. 


5.8  Noise  Modeling 


s„p(f) 


s^p(f) 


Two-sided  PSD 


One-sided  PSD 


Figure  5.21:  The  PSD  of  passband  white  noise  is  flat  over  the  band  of  interest. 


We  now  have  the  background  required  to  discuss  mathematical  modeling  of  noise  in  communi¬ 
cation  systems.  A  generic  model  for  receiver  noise  is  that  it  is  a  random  process  with  zero  DG 
value,  and  with  PSD  which  is  flat,  or  white,  over  a  band  of  interest.  The  key  noise  mechanisms 
in  a  communication  receiver,  thermal  and  shot  noise,  are  both  white,  as  discussed  in  Appendix 
5.G.  For  example.  Figure  5.21  shows  the  two-sided  PSD  of  passband  white  noise  np{t),  which  is 
given  by 

r  No/2  ,  \f-f,\<B/2 
^n,(/)=  No/2,  \f  +  M<B/2 

I  0  ,  else 
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Since  Upit)  is  real- valued,  we  can  also  define  the  one-sided  PSD  as  follows: 


Slif)  = 


No,  I/- /el  <5/2 

0  ,  else 


That  is,  white  noise  has  two-sided  PSD  and  one-sided  PSD  Nq,  over  the  band  of  interest. 
The  power  of  the  white  noise  is  given  by 

/CXD 

SnM)df  =  {No/2)2B  =  NoB 

■OO 

The  PSD  No  is  in  units  of  Watts/Hertz,  or  Joules. 
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No 

No/2 

B 

B 

B 

Two-sided  PSD  One-sided  PSD 

Figure  5.22:  The  PSD  of  baseband  white  noise. 


Similarly,  Figure  5.22  shows  the  one-sided  and  two-sided  PSDs  for  real-valued  white  noise  in  a 
physical  baseband  system  with  bandwidth  B.  The  power  of  this  baseband  white  noise  is  again 
NqB.  As  we  discuss  in  Section  5.D,  as  with  deterministic  passband  signals,  passband  random 
processes  can  also  be  represented  in  terms  of  I  and  Q  components.  We  note  in  Section  5.D  that 
the  I  and  Q  components  of  passband  white  noise  are  baseband  white  noise  processes,  and  that 
the  corresponding  complex  envelope  is  complex-valued  white  noise. 

Noise  Figure:  The  value  of  Nq  summarizes  the  net  effects  of  white  noise  arising  from  various 
devices  in  the  receiver.  Comparing  the  noise  power  NqB  with  the  nominal  figure  of  kTB  for 
thermal  noise  of  a  resistor  with  matched  impedance,  we  define  the  noise  figure  as 


F 


No 


where  k  =  1.38  x  10“^^  Joules/Kelvin  is  Boltzmann’s  constant,  and  the  nominal  “room  temper¬ 
ature  ”  is  taken  by  convention  to  be  Troom  =  290  Kelvin  (the  product  kTroom  ~  4  x  10“^^  Joules, 
so  that  the  numbers  work  out  well  for  this  slightly  chilly  choice  of  room  temperature  at  62.6° 
Fahrenheit).  Noise  hgure  is  usually  expressed  in  dB. 

The  noise  power  for  a  bandwidth  B  is  given  by 


Pn  =  NoB  =  kTroomlON^^dlO^ 


dBW  and  dBm:  It  is  customary  to  express  power  on  the  decibel  (dB)  scale: 

Power  (dBW)  =  10  logio(Power  (watts)) 

Power  (dBm)  =  10 logj^Q (Power  (milliwatts)) 
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On  the  dB  scale,  the  noise  power  over  1  Hz  is  therefore  given  by 

Noise  power  over  1  Hz  =  —174  +  F  dBm  (5.93) 

Thns,  the  noise  power  in  dBm  over  a  bandwidth  of  B  Hz  is  given  by 

Pn{dBm)  =  —174  +  F  +  10  log^g  B  dBm  (5.94) 

Example  5.8.1  (Noise  power  computation)  A  5  GHz  Wireless  Local  Area  Network  (WLAN) 
link  has  a  receiver  bandwidth  B  of  20  MHz.  If  the  receiver  has  a  noise  hgnre  of  6  dB,  what  is 
the  receiver  noise  power  P„? 

Solution:  The  noise  power 

=  NqB  =  kTolO^/^^B  =  (1.38  x  lO-^^) (290) (10®/^°) (20  x  10®) 

=  3.2  X  10“^^  Watts  =  3.2  x  10“^®  milliWatts  (mW) 

The  noise  power  is  often  expressed  in  dBm,  which  is  obtained  by  converting  the  raw  nnmber  in 
milliWatts  (mW)  into  dB.  We  therefore  get 

-Pn,dBm  =  lOlogioPn(niW)  =  -95dBm 

Let  ns  now  redo  this  compntation  in  the  “dB  domain,”  where  the  contributions  to  the  noise 
power  due  to  the  various  system  parameters  simply  add  up.  Using  (5.93),  the  noise  power  in  our 
system  can  be  calculated  as  follows: 

P„(dBm)  =  —174  +  Noise  Figure(dB)  +  10  log^g  Bandwidth(Hz)  (5.95) 

In  our  current  example,  we  obtain  P„(dBm)  =  —174  +  6  +  73  =  —95  dBm,  as  before. 
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Figure  5.23:  Since  receiver  processing  always  involves  some  form  of  band  limitation,  it  is  not 
necessary  to  impose  band  limitation  on  the  WGN  model. 

We  now  add  two  more  features  to  our  noise  model  that  greatly  simplify  computations.  First,  we 
assume  that  the  noise  is  a  Gaussian  random  process.  The  physical  basis  for  this  is  that  noise 
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arises  due  to  the  random  motion  of  a  large  number  of  charge  carriers,  which  leads  to  Gaussian 
statistics  based  on  the  central  limit  theorem  (see  Section  5.B).  The  mathematical  consequence 
of  Gaussianity  is  that  we  can  compute  probabilities  based  only  on  knowledge  of  second  order 
statistics.  Second,  we  remove  band  limitation,  implicitly  assuming  that  it  will  be  imposed  later 
by  hltering  at  the  receiver.  That  is,  we  model  noise  n{t)  (where  n  can  be  real-valued  passband 
or  baseband  white  noise)  as  a  zero  mean  WSS  random  process  with  PSD  flat  over  the  entire 
real  line,  S'„(/)  =  The  corresponding  autocorrelation  function  is  i?n('r)  =  This 

model  is  clearly  physically  unrealizable,  since  the  noise  power  is  inhnite.  However,  since  receiver 
processing  in  bandlimited  systems  always  involves  hltering,  we  can  assume  that  the  receiver  noise 
prior  to  hltering  is  not  bandlimited  and  still  get  the  right  answer.  Figure  5.23  shows  the  steps 
we  use  to  go  from  receiver  noise  in  bandlimited  systems  to  inhnite-power  White  Gaussian  Noise 
(WGN),  which  we  formally  dehne  below. 

White  Gaussian  Noise:  Real-valued  WGN  n{t)  is  a  zero  mean,  WSS,  Gaussian  random  process 
with  Snif)  =  Nq/2  =  cr^.  Equivalently,  R„(r)  =  ^5{t)  =  cr^(5(r).  The  quantity  Nq/2  = 
is  often  termed  the  two-sided  PSD  of  WGN,  since  we  must  integrate  over  both  positive  and 
negative  frequencies  in  order  to  compute  power  using  this  PSD.  The  quantity  A^o  is  therefore 
referred  to  as  the  one-sided  PSD,  and  has  the  dimension  of  Watts/Hertz,  or  Joules. 

The  following  example  provides  a  preview  of  typical  computations  for  signaling  in  WGN,  and 
illustrates  why  the  model  is  so  convenient. 

Example  5.8.2  (On-off  keying  in  continuous  time):  A  receiver  in  an  on-oh  keyed  system 
receives  the  signal  y(t)  =  s(t)  +  n(t)  if  1  is  sent,  and  receives  y(t)  =  n(t)  if  0  is  sent,  where  n(t) 
is  WGN  with  PSD  =  ^.  The  receiver  computes  the  following  decision  statistic: 

Y  =  [  y{t)s{t)dt 


(We  shall  soon  show  that  this  is  actually  the  best  thing  to  do.) 

(a)  Find  the  conditional  distribution  of  D  if  0  is  sent. 

(b)  Find  the  conditional  distribution  of  D  if  1  is  sent. 

(c)  Gompare  with  the  on-off  keying  model  in  Example  5.6.3. 

Solution: 

(a)  Gonditioned  on  0  being  sent,  y(t)  =  n{t)  and  hence  V  =  f  n(t)s(t)dt.  Since  n  is  Gaussian, 
and  V  is  obtained  from  it  by  linear  processing,  E  is  a  Gaussian  random  variable  (conditioned  on 
0  being  sent).  Thus,  the  conditional  distribution  of  V  is  completely  characterized  by  its  mean 
and  variance,  which  we  now  compute. 


E[y]  =  E 


V 


J  n{t)s{t)dt 


j  s{t)'E[n{t)]dt  =  0 


where  we  can  interchange  expectation  and  integration  because  both  are  linear  operations.  Actu¬ 
ally,  there  are  some  mathematical  conditions  (beyond  our  scope  here)  that  need  to  be  satished  for 
such  “natural”  interchanges  to  be  permitted,  but  these  conditions  are  met  for  all  the  examples 
that  we  consider  in  this  text.  Since  the  mean  is  zero,  the  variance  is  given  by 


var(y)  =  E[E2]  =  E 


J  n{t)s{t)dt 


jn(uHn)dn 


Notice  that  we  have  written  out  =  Y  x  Y  as  the  product  of  two  identical  integrals,  but 
with  the  “dummy”  variables  of  integration  chosen  to  be  different.  This  is  because  we  need  to 
consider  all  possible  cross  terms  that  could  result  from  multiplying  the  integral  with  itself.  We 
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now  interchange  expectation  and  integration  again,  noting  that  all  random  qnantities  must  be 
grouped  inside  the  expectation.  This  gives  us 


var(y)  =  /  /  E,[n{t)n{u)]  s{t)s{u)  dt  du 


(5.96) 


Now  this  is  where  the  WGN  model  makes  our  life  simple.  The  autocorrelation  function 

E,[n{t)n{u)]  =  a‘^6{t  —  u) 

Plugging  into  (5.96),  the  delta  function  collapses  the  two  integrals  into  one,  and  we  obtain 


var(y)  =  5{t  —  u)  s{t)s{u)  dt  du  =  /  s^{t)  dt  =  ct^||s| 


We  have  therefore  shown  that  Y  ~  A^(0,  cr^|  |s|  p)  conditioned  on  0  being  sent, 
(b)  Suppose  that  1  is  sent.  Then  y{t)  =  s{t)  +  n{t)  and 


Y=  {s{t)  +  n{t))  s{t)  dt=  s‘^{t)  dt+  n{t)s{t)dt  =  ||s|P+  /  n{t)s{t)  dt 


We  already  know  that  the  second  term  on  the  extreme  right  hand  side  has  distribution  iV(0,  cr^|  |s|  p) 
The  distribution  remains  Gaussian  when  we  add  a  constant  to  it,  with  the  mean  being  translated 
by  this  constant.  We  therefore  conclude  that  Y  ~  iV(|  |s|  p,  cr^|  |s|  p),  conditioned  on  1  being  sent, 
(c)  The  decision  statistic  Y  obeys  exactly  the  same  model  as  in  Example  5.6.3,  with  m  =  ||s|p 
and  =  cr^||s|p.  Applying  the  intuitive  decision  rule  in  that  example,  we  guess  that  1  is  sent  if 
Y  >  1 1 s|  p/2,  and  that  0  is  sent  otherwise.  The  probability  of  error  for  that  decision  rule  equals 


Remark:  The  preceding  example  illustrates  that,  for  linear  processing  of  a  received  signal 
corrupted  by  WGN,  the  signal  term  contributes  to  the  mean,  and  the  noise  term  to  the  variance,  of 
the  resulting  decision  statistic.  The  resulting  Gaussian  distribution  is  a  conditional  distribution, 
because  it  is  conditioned  on  which  signal  is  actually  sent  (or,  for  on-off  keying,  whether  a  signal 
is  sent). 

Complex  baseband  WGN:  Based  on  the  dehnition  of  complex  envelope  that  we  have  used 
so  far  (in  Ghapters  2  through  4),  the  complex  envelope  has  twice  the  energy/power  of  the 
corresponding  passband  signal  (which  may  be  a  sample  path  of  a  passband  random  process).  In 
order  to  get  a  unihed  description  of  WGN,  however,  let  us  now  divide  the  complex  envelope  of 
both  signal  and  noise  by  This  cannot  change  the  performance  of  the  system,  but  leads  to 

the  complex  envelope  now  having  the  same  energy/power  as  the  corresponding  passband  signal. 
Effectively,  we  are  switching  from  defining  the  complex  envelope  via  Up{t)  =  Re  (M(t)e-^^^-^^*),  to 
dehning  it  via  Up{t)  =  Re  (\/2M(f)e-^^’^'^=*) .  This  convention  reduces  the  PSDs  of  the  I  and  Q 
component  by  a  factor  of  two:  we  now  model  them  as  independent  real  WGN  processes,  with 
Sndf)  =  Srisif)  =  Ao/2  =  cr^.  The  steps  in  establishing  this  model  are  shown  in  Figure  5.24. 

We  now  have  the  noise  modeling  background  needed  for  Ghapter  6,  where  we  develop  a  framework 
for  optimal  reception,  based  on  design  criteria  such  as  the  error  probability.  The  next  section 
discusses  linear  processing  of  random  processes,  which  is  useful  background  for  our  modeling 
the  effect  of  hltering  on  noise,  as  well  as  for  computing  quantities  such  as  signal-to-noise  ratio 
(SNR).  It  can  be  skipped  by  readers  anxious  to  get  to  Ghapter  6,  since  the  latter  includes  a 
self-contained  exposition  of  the  effects  of  the  relevant  receiver  operations  on  WGN. 
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Figure  5.24:  We  scale  the  complex  envelope  for  both  signal  and  noise  by  so  that  the  I  and  Q 
components  of  passband  WGN  can  be  modeled  as  independent  WGN  processes  with  PSD  No/2. 


5.9  Linear  Operations  on  Random  Processes 

We  now  wish  to  understand  what  happens  when  we  perform  linear  operations  such  as  hltering 
and  correlation  on  a  random  process.  We  have  already  seen  an  example  of  this  in  Example 
5.8.2,  where  WGN  was  correlated  against  a  deterministic  signal.  We  now  develop  a  more  general 
framework. 

It  is  useful  to  state  up  front  the  following  result. 

Gaussianity  is  preserved  under  linear  operations:  Thus,  if  the  input  to  a  hlter  is  a  Gaussian 
random  process,  so  is  the  output. 

This  is  because  any  set  of  output  samples  can  be  expressed  as  a  linear  combination  of  input 
samples,  or  the  limit  of  such  linear  combinations  (an  integral  for  computing,  for  example,  a 
convolution,  is  the  limit  of  a  sum). 

In  the  remainder  of  this  section,  we  discussion  the  evolution  of  second  order  statistics  under 
linear  operations.  Of  course,  for  Gaussian  random  processes,  this  suffices  to  provide  a  complete 
statistical  description  of  the  output  of  a  linear  operation. 


5.9.1  Filtering 

Suppose  that  a  random  process  x(t)  is  passed  through  a  hlter,  or  an  LTI  system,  with  transfer 
function  G{f)  and  impulse  response  g{t),  as  shown  in  Figure  5.25. 

The  PSD  of  the  output  y(t)  is  related  to  that  of  the  input  as  follows: 

Syif)  =  SM)\G{f)\^  (5.97) 
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Figure  5.25:  Random  process  through  an  LTI  system. 


This  follows  immediately  from  the  operational  dehnition  of  PSD  in  Figure  5.18,  since  the  power 
gain  due  to  the  hlter  at  frequency  /  is  |G(/)p.  Now, 

\G{f)\^  =  G{f)G*{f)^{g*gMFm 

where  gMF{t)  =  g*{—t).  Thus,  taking  the  inverse  Fourier  transform  on  both  sides  of  (5.97),  we 
obtain  the  following  relation  between  the  input  and  output  autocorrelation  functions: 

Ryir)  =  g  *  gMF){r)  (5.98) 

Let  us  now  derive  analogous  results  for  ensemble  averages  for  filtered  WSS  processes. 


Filtered  WSS  random  processes 

Suppose  that  a  WSS  random  process  X  is  passed  through  an  LTI  system  with  impulse  response 
g{t)  (which  we  allow  to  be  complex-valued)  to  obtain  an  output  Y{t)  =  {X  *  g){t).  We  wish  to 
characterize  the  joint  second  order  statistics  of  X  and  Y. 

Defining  the  crosscorrelation  function  of  Y  and  X  as 

Ryx  {t  +  T,t)  =E[Y{t  +  T)X*{t)] 


we  have 


Ryxif  RtR)  =  E 


X{t  +  T 


u)g{u)du  )  X*{t) 


J  Rx{t  -  u)g{u)du 


(5.99) 


interchanging  expectation  and  integration.  Thus,  Ryxif  +  't,  t)  depends  only  on  the  time  differ¬ 
ence  r.  We  therefore  denote  it  by  Ryxij).  From  (5.99,  we  see  that 

Rvxir)  =  {Rx  *  g){T) 

The  autocorrelation  function  of  Y  is  given  by 

RY(.t  +  r,  ()  =  E  [r (( +  t)V(()|  =  E  [K((  +  t)  (/  X((  -  u)g{u)du)  *] 

=  J  E[Y{t  +  T)X*{t  —  u)]g*{u)du  =  J  Ryx{T  +  u)g*{u)du  ^  ' 

Thus,  Ry(t  +  T,t)  depends  only  on  the  time  difference  r,  and  we  denote  it  by  Ry^r).  Recalling 
that  the  matched  filter  gmf{u)  =  g*{—u)  and  replacing  u  by  —u  in  the  integral  at  the  end  of 
(5.100),  we  obtain  that 


Ry(r)  =  {Ryx  *  gmf){T)  =  {Rx  *g*  fi'm/)('r) 
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Finally,  we  note  that  the  mean  fnnction  of  F  is  a  constant  given  by 


Thus,  X  and  Y  are  jointly  WSS:  X  is  WSS,  Y  is  WSS,  and  their  crosscorrelation  function  depends 
on  the  time  difference.  The  formulas  for  the  second  order  statistics,  including  the  corresponding 
power  spectral  densities  obtained  by  taking  Fourier  transforms,  are  collected  below: 

Rrxir)  =  {Rx  *  g){r),  Syx{f)  =  Sx{f)G{f) 

Ry{t)  =  {RYx*gmf){r)  =  {Rx*g*gmf){r),  Syif)  =  Syx{f)G*{f)  =  Sx{f)\G{f)\^ 

(5.101) 

Let  us  apply  these  results  to  inhnite  power  white  noise  (we  do  not  need  to  invoke  Gaussianity  to 
compute  second  order  statistics).  While  the  input  has  inhnite  power,  as  shown  in  the  example 
below,  if  the  hlter  impulse  response  is  square  integrable,  then  the  output  has  hnite  power,  and 
is  equal  to  what  we  would  have  obtained  if  we  had  assumed  that  the  noise  was  bandlimited  to 
start  with. 

Example  5.9.1  (white  noise  through  an  LTI  system— general  formulas)  White  noise 
with  PSD  Sn{f)  =  ^  is  passed  through  an  LTI  system  with  impulse  response  g{t).  We  wish  to 
hnd  the  PSD,  autocorrelation  function,  and  power  of  the  output  y{t)  =  {n  *  g){t).  The  PSD  is 
given  by 

s,(/)  =  S„(/)|G(/)p  =  ^|G(/)P  (5.102) 

We  can  compute  the  autocorrelation  function  directly  or  take  the  inverse  Fourier  transform  of 
the  PSD  to  obtain 

N  N 

Ry{T)  =  {Rn*  g*  gmf)iT)  =  ^{g*  gmf){T)  =  ^  J  g{s)g*{s-T)ds  (5.103) 

The  output  power  is  given  by 

/OO  AT  poo  AT  poo  AT 

Sy{f)df  =  /  \G{f)M-  =  /  \g{t)\^dt  =  -^\\g\\^  (5.104) 

where  the  time  domain  expression  follows  from  Parseval’s  identity,  or  from  setting  r  =  0  in 
(5.103).  Thus,  the  output  noise  power  equals  the  noise  PSD  times  the  energy  of  the  hlter  impulse 
response.  It  is  worth  noting  that  the  PSD  of  y  is  the  same  as  what  we  would  have  gotten  if  the 
input  were  bandlimited  white  noise,  as  long  as  the  band  is  large  enough  to  encompass  frequencies 
where  G(/)  is  nonzero.  Even  if  G(/)  is  not  strictly  bandlimited,  we  get  approximately  the  right 
answer  if  the  input  noise  bandwidth  is  large  enough  so  that  most  of  the  energy  in  G(/)  falls 
within  it. 

When  the  input  random  process  is  Gaussian  as  well  as  WSS,  the  output  is  also  WSS  and  Gaus¬ 
sian,  and  the  preceding  computations  of  second  order  statistics  provide  a  complete  statistical 
characterization  of  the  output  process.  This  is  illustrated  by  the  following  example,  in  which 
WGN  is  passed  through  a  hlter. 

Example  5.9.2  (WGN  through  a  boxcar  impulse  response)  Suppose  that  WGN  n{t) 
with  PSD  cr^  =  ^  =  I  is  passed  through  an  LTI  system  with  impulse  response  g{t)  =  /[o,2](t)  to 
obtain  the  output  y{t)  =  {n*  g){t). 

(a)  Find  the  autocorrelation  function  and  PSD  of  y. 
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(b)  Find  E[|/2(100)]. 

(c)  True  or  False  y  is  a  stationary  random  process. 

(d)  True  or  False:  2/(100)  and  2/(102)  are  independent  random  variables. 

(e)  True  or  False:  2/(100)  and  2/(101)  are  independent  random  variables. 

(f)  Compute  the  probability  P[y{100)  —  2/(101)  +  2/(102)  >  5]  . 

(g)  Which  of  the  preceding  results  rely  on  the  Gaussianity  of  n? 

Solution 

(a)  Since  n  is  WSS,  so  is  y.  The  hlter  matched  to  (/  is  a  boxcar  as  well:  gmfif)  =  -f[-2,o](^)-  Their 
convolution  is  a  triangular  pulse  centered  at  the  origin:  {g  *  gmf){'r)  =  2  [l  —  -y-j  I[-2,2]{'t)-  We 
therefore  have 

Ryi.^)  =  *  9mf){T)  =  ^  -  y)  h-2,2]{r)  =  Y(r) 

(since  y  is  zero  mean).  The  PSD  is  given  by 

Nr. 

SyU)  =  ^\GU)?  = 

since  |G(/)|  =  |2sinc2/|.  Note  that  these  results  do  not  rely  on  Gaussianity. 

(b)  The  power  E[2/^(100)]  =  i?y(0)  =  |. 

(c)  The  output  2/  is  a  Gaussian  random  process,  since  it  obtained  by  a  linear  transformation  of 
the  Gaussian  random  process  n.  Since  y  is  WSS  and  Gaussian,  it  is  stationary.  True. 

(d)  The  random  variables  2/(100)  and  2/(102)  are  jointly  Gaussian  with  zero  mean  and  covariance 
cov(2/(100), 2/(102))  =  Cy{2)  =  Ry{2)  =  0.  Since  they  are  jointly  Gaussian  and  uncorrelated, 
they  are  independent.  TVue. 

(e)  In  this  case,  cov(2/(100),  2/(101))  =  Cy{l)  =  Ry{i-)  =  ;|  7^  0,  so  that  2/(100)  and  2/(101)  are  not 
independent.  False. 

(f)  The  random  variable  Z  =  2/(100)  —  22/(101)  +  32/(102)  is  zero  mean  and  Gaussian,  with 

var(Z)  =  cov  (//(lOO)  -  22/(101)  +  32/(102),  2/(100)  -  22/(101)  +  32/(102)) 

=  cov  (2/(100),  2/(100))  +  4cov  (2/(101),  2/(101))  +  9cov  (2/(102),  2/(102)) 

—  4cov  (2/(100),  2/(101))  +  6cov  (2/(100),  2/(102))  —  12cov  (//(lOO),  2/(101)) 

=  Cy{Q)  +  ACy{Q)  +  9Cy{Q)  -  4.Cy{l)  T  QCy{2)  -  l2Cy{l) 

=  14Gy(0)  -  16Gj,(l)  +  QCy{2)  =  3 

substituting  Cy{R)  =  1,  Cy{l)  =  Cy{2)  =  0.  Thus,  Z  iV(0,3),  and  the  required  probability 

can  be  evaluated  as 

P[Z  >  5]  =  Q  =  0-0019 

(g)  We  invoke  Gaussianity  in  (c),  (d),  and  (f). 


5.9.2  Correlation 

As  we  shall  see  in  Ghapter  6,  a  typical  operation  in  a  digital  communication  receiver  is  to  correlate 
a  noisy  received  waveform  against  one  or  more  noiseless  templates.  Specihcally,  the  correlation 
of  y(t)  (e.g.,  a  received  signal)  against  g(t)  (eg.,  a  noiseless  template  at  the  receiver)  is  dehned 
as  the  inner  product  between  y  and  g,  given  by 

/OO 

y{t)g*{t)dt 

■OO 

(We  restrict  attention  to  real-valued  signals  in  example  computations  provided  here,  but  the 
preceding  notation  is  general  enough  to  include  complex- valued  signals.) 


(5.105) 
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Signal-to-Noise  Ratio  and  its  Maximization 


If  y{t)  is  a  random  process,  we  can  compnte  the  mean  and  variance  of  {y,g)  given  the  second 
order  statistics  (i.e.,  mean  function  and  autocorrelation  function)  of  y,  as  shown  in  Problem  5.51. 
However,  let  us  consider  here  a  special  case  of  particular  interest  in  the  study  of  communication 
systems: 

y{t)  =  s{t)  +n{t) 

where  we  now  restrict  attention  to  real- valued  signals  for  simplicity,  with  s{t)  denoting  a  deter¬ 
ministic  signal  (e.g.,  corresponding  to  a  specific  choice  of  transmitted  symbols)  and  n{t)  zero 
mean  white  noise  with  PSD  Sn{f)  =  The  output  of  correlating  y  against  g  is  given  by 

/CO  poo 

s{t)g{t)dt+  /  n{t)g{t)dt 

-OO  j  —CO 


Since  both  the  signal  and  noise  terms  scale  up  by  identical  factors  if  we  scale  up  a  performance 
metric  of  interest  is  the  ratio  of  the  signal  power  to  the  noise  power  at  the  output  of  the  correlator, 
dehned  as  follows 


SNR 


\{s,g)? 

mn,9W] 


How  should  we  choose  g  in  order  to  maximize  SNR?  In  order  to  answer  this,  we  need  to  compute 
the  noise  power  in  the  denominator.  We  can  rewrite  it  as 


E\{n,g)\^]=E 


j  n{t)g{t)dt 


n{s)g{s)ds 


where  we  need  to  use  two  different  dummy  variables  of  integration  to  make  sure  we  capture  all 
the  cross  terms  in  the  two  integrals.  Now,  we  take  the  expectation  inside  the  integrals,  grouping 
all  random  together  inside  the  expectation: 


E\{n,g)\^] 


E[n{t)n{s)]g{t)g{s)dtds  =  Rnit 


s)g{t)g{s)dtds 


This  is  where  the  inhnite  power  white  noise  model  becomes  useful:  plugging  in  Rn{t  —  s)  = 
^5{t  —  s),  we  find  that  the  two  integrals  collapse  into  one,  and  obtain  that 


E\{n,g)\^]  = 


fVn 


6{t  —  s)g{t)g{s)dtds  = 


fVn 


\g{t)\^dt  = 


fVn 


(5.106) 


Thus,  the  SNR  can  be  rewritten  as 


SNR 


f\\9\? 


—  |(s  — 


Drawing  on  the  analogy  between  signals  and  vectors,  note  that  |  is  the  “unit  vector”  pointing 
along  g.  We  wish  to  choose  g  such  that  the  size  of  the  projection  of  the  signal  s  along  this  unit 
vector  is  maximized.  Clearly,  this  is  accomplished  by  choosing  the  unit  vector  along  the  direction 
of  s.  (A  formal  proof  using  the  Cauchy-Schwartz  inequality  is  provided  in  Problem  5.50.)  That 
is,  we  must  choose  to  be  a  scalar  multiple  of  s  (any  scalar  multiple  will  do,  since  SNR  is  a 
scale-invariant  quantity).  In  general,  for  complex- valued  signals  in  complex- valued  white  noise 
(useful  for  modeling  in  complex  baseband),  it  can  be  show  sthat  g  must  be  a  scalar  multiple 
of  s*{t).  When  we  plug  this  in,  the  maximum  SNR  we  obtain  is  2||s|p/A"o-  These  results  are 
important  enough  to  state  formally,  and  we  do  this  below. 
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Theorem  5.9.1  For  linear  processing  of  a  signal  s{t)  corrupted  by  white  noise,  the  output  SNR 
is  maximized  by  correlating  against  s(t).  The  resulting  SNR  is  given  by 


SNR 


max 


(5.107) 


The  expression  (5.106)  for  the  noise  power  at  the  output  of  a  correlator  is  analogous  to  the 
expression  (5.104)  (Example  5.9.1)  for  the  power  of  white  noise  through  a  filter.  This  is  no 
coincidence.  Any  correlation  operation  can  be  implemented  using  a  filter  and  sampler,  as  we 
discuss  next. 


Matched  Filter 

Correlation  with  a  waveform  g{t)  can  be  achieved  using  a  filter  h{t)  =  g{—t)  and  sampling  at 
time  f  =  0.  To  see  this,  note  that 

/oo  poo 

y{r)h{-T)dT  =  /  y{T)g{T)dT 

OO  J  —oo 

Comparing  with  the  correlator  output  (5.105),  we  see  that  Z  =  2:(0).  Now,  applying  Theorem 
5.9.1,  we  see  that  the  SNR  is  maximized  by  choosing  the  filter  impulse  response  as  s*{—t).  As 
we  know,  this  is  called  the  matched  filter  for  s,  and  we  denote  its  impulse  response  as  smf(^)  = 
s*{—t).  We  can  now  restate  Theorem  5.9.1  as  follows. 

Theorem  5.9.2  For  linear  processing  of  a  signal  s{t)  corrupted  by  white  noise,  the  output  SNR 
is  maximized  by  employing  a  matched  filter  with  impulse  response  SMpit)  =  sampled  at 

time  t  =  0. 


Figure  5.26:  A  signal  passed  through  its  matched  filter  gives  a  peak  at  time  t  =  0.  When  the 
signal  is  delayed  by  to,  the  peak  occurs  at  t  =  R. 


The  statistics  of  the  noise  contribution  to  the  matched  filter  output  do  not  depend  on  the 
sampling  time  (WSS  noise  into  an  LTI  system  yields  a  WSS  random  process),  hence  the  optimum 
sampling  time  is  determined  by  the  peak  of  the  signal  contribution  to  the  matched  hlter  output. 
The  signal  contribution  to  the  output  of  the  matched  filter  at  time  t  is  given  by 


z{t) 


j  s{T)sMF{t 


T)dT 


s{t)s*{t  —  f)dT 


This  is  simply  the  correlation  of  the  signal  with  itself  at  delay  t.  Thus,  the  matched  hlter  enables 
us  to  implement  an  inhnite  bank  of  correlators,  each  corresponding  to  a  version  of  our  signal 
template  at  a  different  delay.  Figure  5.26  shows  a  rectangular  pulse  passed  through  its  matched 
hlter.  For  received  signal  y{t)  =  s{t)  +  n{t),  we  have  observed  that  the  optimum  sampling  time 
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(i.e.  the  correlator  choice  maximizing  SNR)  is  t  =  0.  More  generally,  when  the  received  signal  is 
given  by  y{t)  =  s{t  —  to)  +  n(t),  the  peak  of  the  signal  contribution  to  the  matched  filter  shifts 
to  t  =  to,  which  now  becomes  the  optimum  sampling  time. 

While  the  preceding  computations  rely  only  on  second  order  statistics,  once  we  invoke  the  Gaus- 
sianity  of  the  noise,  as  we  do  in  Chapter  6,  we  will  be  able  to  compute  probabilities  (a  preview 
of  such  computations  is  provided  by  Examples  5.8.2  and  5.9.2(f)).  This  will  enable  us  to  develop 
a  framework  for  receiver  design  for  minimizing  the  probability  of  error. 


5.10  Concept  Inventory 

We  do  not  summarize  here  the  review  of  probability  and  random  variables,  but  note  that  key 
concepts  relevant  for  communication  systems  modeling  are  conditional  probabilities  and  densities, 
and  associated  results  such  as  law  of  total  probability  and  Bayes’  rule.  As  we  see  in  much  greater 
detail  in  Chapter  6,  conditional  probabilities  and  densities  are  used  for  statistical  characterization 
of  the  received  signal,  given  the  transmitted  signal,  while  Bayes’  rule  can  be  used  to  infer  which 
signal  was  transmitted,  given  the  received  signal. 

Gaussian  random  variables 

•  A  Gaussian  random  variable  X  ~  N{m,v‘^)  is  characterized  by  its  mean  m  and  variance 

•  Gaussianity  is  preserved  under  translation  and  scaling.  Particularly  useful  is  the  transformation 
to  a  standard  (V(0, 1))  Gaussian  random  variable:  if  W  ~  N{rn,v‘^),  then  ~  A^(0, 1).  This 
allows  probabilities  involving  any  Gaussian  random  variable  to  be  expressed  in  terms  of  the  CDF 
<h(x)  and  CCDF  Q{x)  for  a  standard  Gaussian  random  variable. 

•  Random  variables  Xi,  ...,X„  are  jointly  Gaussian,  or  X  =  (Xi,  ...,X„)^  is  a  Gaussian  random 
vector,  if  any  linear  combination  a^X  =  aiXi  +  ...  +  a„X„  is  a  Gaussian  random  variable. 

•  A  Gaussian  random  vector  X  ~  X(m,  C)  is  completely  characterized  by  its  mean  vector  m 
and  covariance  matrix  C. 

•  Uncorrelated  and  jointly  Gaussian  random  variables  are  independent. 

•  The  joint  density  for  X  ~  N{m,  C)  exists  if  and  only  if  C  is  invertible. 

•  The  mean  vector  and  covariance  matrix  evolve  separately  under  affine  transformations:  for 
Y  =  AX  +  b,  my  =  Amx  +  b  and  Cy  =  ACx A^. 

•  Joint  Gaussianity  is  preserved  under  affine  transformations:  if  X  ~  N{m,  C)  and  Y  =  AX  +  b, 
then  Y  ~  A(Am  +  b,  ACA"^). 

Random  processes 

•  A  random  process  is  a  generalization  of  the  concept  of  random  vector;  it  is  a  collection  of 
random  variables  on  a  common  probability  space. 

•  While  statistical  characterization  of  a  random  process  requires  specification  of  the  finite¬ 
dimensional  distributions,  coarser  characterization  via  its  second  order  statistics  (the  mean  and 
autocorrelation  functions)  is  often  employed. 

•  A  random  process  X  is  stationary  if  its  statistics  are  shift-invariant;  it  is  WSS  if  its  second 
order  statistics  are  shift-invariant. 

•  A  random  process  is  Gaussian  if  any  collection  of  samples  is  a  Gaussian  random  vector,  or 
equivalently,  if  any  linear  combination  of  any  collection  of  samples  is  a  Gaussian  random  variable. 

•  A  Gaussian  random  process  is  completely  characterized  by  its  mean  and  autocorrelation  (or 
mean  and  autocovariance)  functions. 

•  A  stationary  process  is  WSS.  A  WSS  Gaussian  random  process  is  stationary. 

•  The  autocorrelation  function  and  the  power  spectral  density  form  a  Fourier  transform  pair. 
(This  observation  applies  both  to  time  averages  and  to  ensemble  averages  for  WSS  processes.) 

•  The  most  common  model  for  noise  in  communication  systems  is  WGN.  WGN  n{t)  is  zero  mean, 

WSS,  Gaussian  with  a  flat  PSD  Sn{f)  =  ^  -H-  Rn('r)  =  cT^(5(r).  While  physically  unreal¬ 

izable  (it  has  inhnite  power),  it  is  a  useful  mathematical  abstraction  for  modeling  the  flatness 
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of  the  noise  PSD  over  the  band  of  interest.  In  complex  baseband,  noise  is  modeled  as  I  and  Q 
components  which  are  independent  real- valued  WGN  processes. 

•  A  WSS  random  process  X  through  an  LTI  system  with  impulse  response  g{t)  yields  a  WSS 
random  process  Y .  X  and  Y  are  also  jointly  WSS.  We  have  Syif)  =  5'x(/)|G(/)p  ■(-)■  RyiT)  = 
{Rx  *g*gmf)iT). 

•  The  statistics  of  WGN  after  linear  operations  such  as  correlation  and  hltering  are  easy  to 
compute  because  of  its  impulsive  autocorrelation  function. 

•  When  the  received  signal  equals  signal  plus  WGN,  the  SNR  is  maximized  by  matched  hltering 
against  the  signal. 


5.11  Endnotes 

There  are  a  number  of  textbooks  on  probability  and  random  processes  for  engineers  that  can 
be  used  to  supplement  the  brief  communications-centric  exposition  here,  including  Yates  and 
Goodman  [21],  Woods  and  Stark  [22],  Leon-Garcia  [23],  and  Papoulis  and  Pillai  [24]. 

A  more  detailed  treatment  of  the  noise  analysis  for  analog  modulation  provided  in  Appendix  5.E 
can  be  found  in  a  number  of  communication  theory  texts,  with  Ziemer  and  Tranter  [4]  providing 
a  sound  exposition. 

As  a  historical  note,  thermal  noise,  which  plays  such  a  crucial  role  in  communications  systems 
design,  was  hrst  experimentally  characterized  in  1928  by  Johnson  [25].  Johnson’s  discussed  his 
results  with  Nyquist,  who  quickly  came  up  with  a  theoretical  characterization  [26].  See  [27]  for  a 
modern  re-derivation  of  Nyquist’s  formula,  and  [28]  for  a  discussion  of  noise  in  transistors.  These 
papers  and  the  references  therein  are  good  resources  for  further  exploration  into  the  physical  basis 
for  noise,  which  we  can  only  hint  at  here  in  Appendix  5.G.  Of  course,  as  discussed  in  Section 
5.8,  from  a  communication  systems  designer’s  point  of  view,  it  typically  suffices  to  abstract  away 
from  such  physical  considerations,  using  the  noise  hgure  as  a  single  number  summarizing  the 
effect  of  receiver  circuit  noise. 


5.12  Problems 

Conditional  probabilities,  law  of  total  probability,  and  Bayes’  rule 

Problem  5.1  You  are  given  a  pair  of  dice  (each  with  six  sides).  One  is  fair,  the  other  is  unfair. 
The  probability  of  rolling  6  with  the  unfair  die  is  1/2,  while  the  probability  of  rolling  1  through 
5  is  1/10.  You  now  pick  one  of  the  dice  at  random  and  begin  rolling.  Gonditioned  on  the  die 
picked,  successive  rolls  are  independent. 

(a)  Gonditioned  on  picking  the  unfair  die,  what  is  the  probability  of  the  sum  of  the  numbers  in 
the  hrst  two  rolls  being  equal  to  10? 

(b)  Gonditioned  on  getting  a  sum  of  10  in  your  hrst  two  throws,  what  is  the  probability  that  you 
picked  the  unfair  die? 


Problem  5.2  A  student  who  studies  for  an  exam  has  a  90%  chance  of  passing.  A  student  who 
does  not  study  for  the  exam  has  a  90%  chance  of  failing.  Suppose  that  70%  of  the  students 
studied  for  the  exam. 

(a)  What  is  the  probability  that  a  student  fails  the  exam? 

(b)  What  is  the  probability  that  a  student  who  fails  studied  for  the  exam? 

(c)  What  is  the  probability  that  a  student  who  fails  did  not  study  for  the  exam? 

(d)  Would  you  expect  the  probabilities  in  (b)  and  (c)  to  add  up  to  one? 
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Problem  5.3  A  receiver  decision  statistic  P  in  a  commnnication  system  is  modeled  as  expo¬ 
nential  with  mean  1  if  0  is  sent,  and  as  exponential  with  mean  10  if  1  is  sent.  Assnme  that  we 
send  0  with  probability  0.6. 

(a)  Find  the  conditional  probability  that  Y  >  5,  given  that  0  is  sent. 

(b)  Find  the  conditional  probability  that  P  >  5,  given  that  1  is  sent. 

(c)  Find  the  nnconditional  probability  that  Y  >  5. 

(d)  Given  that  Y  >  5,  what  is  the  probability  that  0  is  sent? 

(e)  Given  that  Y  =  5,  what  is  the  probability  that  0  is  sent? 

Problem  5.4  Ghannel  codes  are  constrncted  by  introdncing  rednndancy  in  a  strnctured  fashion. 
A  canonical  means  of  doing  this  is  by  introdncing  parity  checks.  In  this  problem,  we  see  how 
one  can  make  inferences  based  on  three  bits  6i,fo2,&3  which  satisfy  a  parity  check  eqnation; 
foi  ©  ^2  ©  =  0.  Here  ©  denotes  an  exclnsive  or  (XOR)  operation. 

(a)  Snppose  that  we  know  that  P\bi  =  0]  =  0.8  and  P[&2  =  1]  =  0.9,  and  model  hi  and  62  as 
independent.  Find  the  probability  =  0]. 

(b)  Define  the  log  likelihood  ratio  (LLRs)  for  a  bit  b  as  LLR{h)  =  log-^^^.  Setting  Li  = 

LLR{bi),  i  =  1,2,3,  hnd  an  expression  for  L3  in  terms  of  Li  and  L2,  again  modeling  bi  and  62 
as  independent. 

Problem  5.5  A  bit  X  G  {0, 1}  is  repeatedly  transmitted  nsing  n  independent  uses  of  a  binary 
symmetric  channel  (i.e.,  the  binary  channel  in  Figure  5.2  with  a  =  b)  with  crossover  probability 
a  =  0.1.  The  receiver  uses  a  majority  rule  to  make  a  decision  on  the  transmitted  bit. 

(a)  Let  Z  denote  the  number  of  ones  at  the  channel  output.  (Z  takes  values  0, 1,  ...,n.)  Specify 
the  probability  mass  function  of  Z,  conditioned  on  X  =  0. 

(b)  Gonditioned  on  X  =  0,  what  is  the  probability  of  deciding  that  one  was  sent  (i.e.,  what  is 
the  probability  of  making  an  error)? 

(c)  Find  the  posterior  probabilities  P[X  =  0\Z  =  m],  m  =  0, 1,  ...,5,  assuming  that  0  or  1  are 
equally  likely  to  be  sent.  Do  a  stem  plot  against  m. 

(d)  Repeat  (c)  assuming  that  the  0  is  sent  with  probability  0.9. 

(e)  As  an  alternative  visualization,  plot  the  LLR  log  versus  m  for  (c)  and  (d). 


Received 


+3 

+1 

-1 

-3 


Figure  5.27:  Two-input  four-output  channel  for  Problem  5.6. 


Problem  5.6  Gonsider  the  two-input,  four-output  channel  with  transition  probabilities  shown 
in  Figure  5.27.  In  your  numerical  computations,  take  p  =  0.05,  q  =  0.1,  r  =  0.3.  Denote  the 
channel  input  by  X  and  the  channel  output  by  Y. 

(a)  Assume  that  0  and  1  are  equally  likely  to  be  sent.  Find  the  conditional  probability  of  0 
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being  sent,  given  each  possible  value  of  the  output.  That  is,  compute  P[X  =  0|y  =  y]  for  each 
1/ G  {— 3,  — 1, +1, +3}.  ^ 

(b)  Express  the  results  in  (a)  as  log  likelihood  ratios  (LLRs).  That  is,  compute  L{y)  =  log  p|x=i[y=^j 
for  each  y  G  {—3,  —1,  +1,  +3}. 

(c)  Assume  that  a  bit  X,  chosen  equiprobably  from  {0, 1},  is  sent  repeatedly,  using  three  indepen¬ 
dent  uses  of  the  channel.  The  channel  outputs  can  be  represented  as  a  vector  Y  =  (Yi,  Y2, 

For  channel  outputs  y  =  (-1-1,  -|-3,  —1)^,  find  the  conditional  probabilities  P[Y  =  y|X  =  0]  and 
F[Y  =  y|Y  =  l]. 

(d)  Use  Bayes’  rule  and  the  result  of  (c)  to  find  the  posterior  probability  P[X  =  0|Y  =  y]  for 
y  =  (-1-1,  -|-3,  —1)^.  Also  compute  the  corresponding  LLR  L{y)  =  log  p[x=i[Y=y]  • 

(e)  Would  you  decide  0  or  1  was  sent  when  you  see  the  channel  output  y  =  (-1-1,  -1-3,  —1)^? 


Random  variables 

Problem  5.7  Let  X  denote  an  exponential  random  variable  with  mean  10. 

(a)  What  is  the  probability  that  X  is  bigger  than  20? 

(b)  What  is  the  probability  that  X  is  smaller  than  5? 

(c)  Suppose  that  we  know  that  X  is  bigger  than  10.  What  is  the  conditional  probability  that  it 
is  bigger  than  20? 

(d)  Find  E[e“^]. 

(e)  Find  E[A:3]. 

Problem  5.8  Let  Ui,...,Un  denote  i.i.d.  random  variables  with  CDF  Fu{u).  (a)  Let  X  = 
max  (Ui, ...,  Un)-  Show  that 

P[X  <x]  =  F^{x) 

(b)  Let  Y  =  min  (Ui, ...,  Un)-  Show  that 

P\y<v]  =  l-{1-  Fu(y)r 

(c)  Suppose  that  Ui,...Un  are  uniform  over  [0,1].  Plot  the  CDF  of  X  for  n  =  1,  n  =  5  and 
n  =  10,  and  comment  on  any  trends  that  you  notice. 

(d)  Repeat  (c)  for  the  CDF  of  Y. 

Problem  5.9  True  or  False  The  minimum  of  two  independent  exponential  random  variables 
is  exponential. 

True  or  False:  The  maximum  of  two  independent  exponential  random  variables  is  exponential. 


Problem  5.10  Let  U  and  V  denote  independent  and  identically  distributed  random  variables, 
uniformly  distributed  over  [0, 1]. 

(a)  Find  and  sketch  the  CDF  of  X  =  min(f/,  V). 

Hint:  It  might  be  useful  to  consider  the  complementary  CDF. 

(b)  Find  and  sketch  the  CDF  of  Y  =  V/U.  Make  sure  you  specify  the  range  of  values  taken  by 

Y. 

Hint:  It  is  helpful  to  draw  pictures  in  the  (m,  v)  plane  when  evaluating  the  probabilities  of  interest. 

Problem  5.11  (Relation  between  Gaussian  and  exponential)  Suppose  that  Xi  and  X2 
are  i.i.d.  X(0, 1). 

(a)  Show  that  Z  =  Xf  +  X|  is  exponential  with  mean  2. 

(b)  True  or  False:  Z  is  independent  of  0  =  tan“^ 

Hint:  Use  the  results  from  Example  5.4.3,  which  tells  us  the  joint  distribution  of  \/Z  and  0. 
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Problem  5.12  (The  role  of  the  uniform  random  variable  in  simulations)  Let  U  denote 
a  uniform  random  variable  which  is  uniformly  distributed  over  [[0,1].  (a)  Let  F{x)  denote  an 
arbitrary  CDF  (assume  for  simplicity  that  it  is  continuous).  Defining  X  =  F~^{U),  show  that 
X  has  CDF  F{x). 

Remark:  This  gives  us  a  way  of  generating  random  variables  with  arbitrary  distributions,  assum¬ 
ing  that  we  have  a  random  number  generator  for  uniform  random  variables.  The  method  works 
even  if  X  is  a  discrete  or  mixed  random  variable,  as  long  as  F~^  is  dehned  appropriately. 

(b)  Find  a  function  g  such  that  Y  =  g{U)  is  exponential  with  mean  2,  where  U  is  uniform  over 

|0, 1], 

(c)  Use  the  result  in  (b)  and  Matlab’s  rand()  function  to  generate  an  i.i.d.  sequence  of  1000 
exponential  random  variables  with  mean  2.  Plot  the  histogram  and  verify  that  it  has  the  right 
shape. 


Problem  5.13  (Generating  Gaussian  random  variables)  Suppose  that  f/i,  U2  are  i.i.d. 

and  uniform  over  [0, 1]. 

(a)  What  is  the  joint  distribution  of  Z  =  — 21nt/i  and  0  =  27rt/2? 

(b)  Show  that  Xi  =  y/ZcosQ  and  X2  =  \/Zsin0  are  i.i.d.  X(0, 1)  random  variables. 

Hint:  Use  Example  5.4.3  and  Problem  5.11. 

(c)  Use  the  result  of  (b)  to  generate  2000  i.i.d.  X(0,1)  random  variables  from  2000  i.i.d.  ran¬ 
dom  variables  uniformly  distributed  over  [0, 1],  using  Matlab’s  rand()  function.  Check  that  the 
histogram  has  the  right  shape. 

(d)  Use  simulations  to  estimate  E[X^],  where  X  ~  X(0, 1),  and  compare  with  the  analytical 
result. 

(e)  Use  simulations  to  estimate  P[X^  +  X  >3],  where  X  X(0,1). 


Problem  5.14  (Generating  discrete  random  variables)  Let  Ui, ...,  Un  denote  i.i.d.  random 
variables  uniformly  distributed  over  [0,1]  (e.g.,  generated  by  the  rand()  function  in  Matlab). 
Dehne,  for  i  =  1, ...,  n. 


1,  Ui  >  0.7 
0,  Ui  <  0.7 


(a)  Sketch  the  CDF  of  Yi. 

(b)  Find  (analytically)  and  plot  the  PMF  of  Z  =  Yi  -|-  ...  -|-  W,  for  n  =  20. 

(c)  Use  simulation  to  estimate  and  plot  the  histogram  of  Z,  and  compare  against  the  PMF  in 
(b). 

(d)  Estimate  E[Z]  by  simulation  and  compare  against  the  analytical  result. 

(e)  Estimate  E[Z^]  by  simulation. 


Gaussian  random  variables 


Problem  5.15  Two  random  variables  X  and  Y  have  joint  density 

2x^+y^ 

Px.Y{^,y)= 

xy  <U 


(a)  Find  K. 

(b)  Show  that  X  and  Y  are  each  Gaussian  random  variables. 

(c)  Express  the  probability  P[X^  -|-  X  >  2]  in  terms  of  the  Q  function. 

(d)  Are  X  and  Y  jointly  Gaussian? 

(e)  Are  X  and  Y  independent? 

(f)  Are  X  and  Y  uncorrelated? 

(g)  Find  the  conditional  density  Px\Y{x\y).  Is  it  Gaussian? 
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Problem  5.16  (computations  involving  joint  Gaussianity)  The  random  vector  X  =  (XiX2)^ 
is  Gaussian  with  mean  vector  m  =  (2, 1)^  and  covariance  matrix  C  given  by 


(a)  Let  Tj  =  Xi  +  2X2,  1^2  =  +  X2.  Find  cov{Y^,  y2). 

(b)  Write  down  the  joint  density  of  Yi  and  l2- 

(c)  Express  the  probability  P[yi  >  2Y2  +  1]  in  terms  of  the  Q  function. 

Problem  5.17  (computations  involving  joint  Gaussianity)  The  random  vector  X  =  (XiX2)^ 
is  Gaussian  with  mean  vector  m  =  (—3,  2)^  and  covariance  matrix  C  given  by 


(a)  Let  Yi  =  2Xi  -  X2,  F2  =  -^1  +  3X2.  Find  cov(Fi,  Fa)- 

(b)  Write  down  the  joint  density  of  Fi  and  Fa. 

(c)  Express  the  probability  P[F2  >  2Fi  —  1]  in  terms  of  the  Q  function  with  positive  arguments. 

(d)  Express  the  probability  P\Y^  >  3Fi  + 10]  in  terms  of  the  Q  function  with  positive  arguments. 

Problem  5.18  (plotting  the  joint  Gaussian  density)  For  jointly  Gaussian  random  variables 
X  and  F,  plot  the  density  and  its  contours  as  in  Figure  5.15  for  the  following  parameters: 

(a)  =  1,  (Ty  =  1,  p  =  0. 

(b)  ct|  =  1,  =  1,  p  =  0.5. 

(c)  =  4,  erf.  =  1,  p  =  0.5. 

(d)  Gomment  on  the  differences  between  the  plots  in  the  three  cases. 

Problem  5.19  (computations  involving  joint  Gaussianity)  In  each  of  the  three  cases  in 
Problem  5.18, 

(a)  specify  the  distribution  of  X  —  2F ; 

(b)  determine  whether  X  —  2F  is  independent  of  X? 

Problem  5.20  (computations  involving  joint  Gaussianity)  X  and  F  are  jointly  Gaussian, 
each  with  variance  one,  and  with  normalized  correlation  —  |.  The  mean  of  X  equals  one,  and 
the  mean  of  F  equals  two. 

(a)  Write  down  the  covariance  matrix. 

(b)  What  is  the  distribution  oi  Z  =  2X  +  3F? 

(c)  Express  the  probability  P[Z^  —  Z  >  6]  in  terms  of  Q  function  with  positive  arguments,  and 
then  evaluate  it  numerically. 

Problem  5.21  (From  Gaussian  to  Rayleigh,  Rician,  and  Exponential  Random  Vari¬ 
ables)  Let  Xi,  X2  be  iid  Gaussian  random  variables,  each  with  mean  zero  and  variance 
Dehne  (P,  $)  as  the  polar  representation  of  the  point  (Xi,X2),  i.e., 

Xi  =  Pcos$X2  =  Psin<h 


where  P  >  0  and  <|)  G  [0,  27r]. 

(a)  Find  the  joint  density  of  P  and  <h. 

(b)  Observe  from  (a)  that  P,  <h  are  independent.  Show  that  <h  is  uniformly  distributed  in  [0,  27r], 
and  hnd  the  marginal  density  of  P. 

(c)  Find  the  marginal  density  of  P^. 
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(d)  What  is  the  probability  that  is  at  least  20  dB  below  its  mean  value?  Does  your  answer 
depend  on  the  value  of 

Remark:  The  random  variable  R  is  said  to  have  a  Rayleigh  distribution.  Further,  you  should 
recognize  that  R^  has  an  exponential  distribution. 

(e)  Now,  assume  that  Xi  ~  N{mi,v‘^),  X2  ~  N{m2,v'^)  are  independent,  where  mi  and  m2  may 
be  nonzero.  Find  the  joint  density  of  R  and  <F,  and  the  marginal  density  of  R.  Express  the  latter 
in  terms  of  the  modihed  Bessel  function 


1 

Iq{x)  =  —  /  exp(a:cos6*)  d9 
27r  Jo 

Remark:  The  random  variable  R  is  said  to  have  a  Rician  distribution  in  this  case.  This 
specializes  to  a  Rayleigh  distribution  when  mi  =  m2  =  0. 


Random  Processes 

Problem  5.22  Let  X{t)  =  2  sin  (207rt  +  0),  where  0  takes  values  with  equal  probability  in  the 
set  {0, 7r/2,  TT,  37r/2}. 

(a)  Find  the  ensemble-averaged  mean  function  and  autocorrelation  function  of  X. 

(b)  Is  X  WSS? 

(c)  Is  X  stationary? 

(d)  Find  the  time-averaged  mean  and  autocorrelation  function  of  X.  Do  these  depend  on  the 
realization  of  0? 

(e)  Is  X  ergodic  in  mean  and  autocorrelation? 


Problem  5.23  Let  X{t)  =  Uc  cos  27r fR  —  UsSm2nfct,  where  Uc,Us  are  i.i.d.  iV(0, 1)  random 
variables. 

(a)  Specify  the  distribution  of  X  (t)  for  each  possible  value  of  t. 

(b)  Show  that  you  can  rewrite  X(t)  =  Rcos(27r/ct  -|-  0),  specifying  the  joint  distribution  of  A 
and  0. 

Hint:  You  can  use  Example  5.4.3. 

(c)  Compute  the  ensemble-averaged  mean  function  and  autocorrelation  function  of  X.  Is  X 
WSS? 

(d)  Is  X  ergodic  in  mean? 

(e)  Is  X  ergodic  in  autocorrelation? 


Problem  5.24  For  each  of  the  following  functions,  sketch  it  and  state  whether  it  can  be  a  valid 
autocorrelation  function.  Give  reasons  for  your  answers. 

(a)  /i(r)  =  (1  -  |r|)/[_i,i](r). 

(b)  /2(r)  =  /i(r-l). 

(c)  fsir)  =  /i(r)  -  I  (/i(r  -  1)  +  /i(r  +  1)). 


Problem  5.25  Consider  the  random  process  Xp{t)  =  Xc{t)  cos27rfct  —  Xs{t)  sin  27rfct,  where 
Xc,  Xs  are  random  processes  dehned  on  a  common  probability  space. 

(a)  Find  conditions  on  Xc  and  Xg  such  that  Xp  is  WSS. 

(b)  Specify  the  (ensemble  averaged)  autocorrelation  function  and  PSD  of  Xp  under  the  conditions 
in  (a). 

(c)  Assuming  that  the  conditions  in  (a)  hold,  what  are  the  additional  conditions  for  Xp  to  be  a 
passband  random  process? 
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Problem  5.26  Consider  the  square  wave  a:(t)  =  where  p(t)  =  /[_i/2,i/2]  (^)- 

(a)  Find  the  time- averaged  autocorrelation  function  of  x  by  direct  computation  in  the  time  do¬ 
main. 

Hint:  The  autocorrelation  function  of  a  periodic  signal  is  periodic. 

(b)  Find  the  Fourier  series  for  x,  and  use  this  to  find  the  PSD  of  x. 

(c)  Are  the  answers  in  (a)  and  (b)  consistent? 


Problem  5.27  Consider  again  the  square  wave  x{t)  =  ~  '^)i  where  p{t)  = 

/[_i/2p/2](t).  Define  the  random  process  X{t)  =  x{t  —  D),  where  D  is  a  random  variable  which 
is  uniformly  distributed  over  the  interval  [0, 1]. 

(a)  Find  the  ensemble  averaged  autocorrelation  function  of  X. 

(b)  Is  X  WSS? 

(c)  Is  X  stationary? 

(d)  Is  X  ergodic  in  mean  and  autocorrelation  function? 

Problem  5.28  Let  n{t)  denote  a  zero  mean  baseband  random  process  with  PSD  Sn{f)  = 
Find  and  sketch  the  PSD  of  the  following  random  processes. 

(a)  xi{t)  =  ^{t). 

(b)  X2{t)  =  for  d=\. 

(c)  Find  the  powers  of  Xi  and  X2- 

Problem  5.29  Consider  a  WSS  random  process  with  autocorrelation  function  Rx{t)  = 
where  a  >  0. 

(a)  Find  the  output  power  when  X  is  passed  through  an  ideal  LPF  of  bandwidth  W. 

(b)  Find  the  99%  power  containment  bandwidth  of  X.  How  does  it  scale  with  the  parameter  a? 


Channel 


Equalizer 


Figure  5.28:  Baseband  communication  system  in  Problem  5.30. 


Problem  5.30  Consider  the  baseband  communication  system  depicted  in  Figure  5.28,  where  the 
message  is  modeled  as  a  random  process  with  PSD  Sm{f)  ~  ^  “  ■^)  -^[-2,2](/)-  Receiver  noise 

is  modeled  as  bandlimited  white  noise  with  two-sided  PSD  Sn{f)  =  \P[-3,3]{f)-  The  equalizer 
removes  the  signal  distortion  due  to  the  channel. 

(a)  Find  the  signal  power  at  the  channel  input. 

(b)  Find  the  signal  power  at  the  channel  output. 

(c)  Find  the  SNR  at  the  equalizer  input. 

(d)  Find  the  SNR  at  the  equalizer  output. 

Problem  5.31  A  zero  mean  WSS  random  process  X  has  power  spectral  density  Sx{f)  =  (1  ~ 

l/l)/[-l, !](/)• 

(a)  Find  E[X(100)X(100.5],  leaving  your  answer  in  as  explicit  a  form  as  you  can. 

(b)  Find  the  output  power  when  X  is  passed  through  a  filter  with  impulse  response  h{t)  =  sinct. 
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Problem  5.32  A  signal  s{t)  in  a  communication  system  is  modeled  as  a  zero  mean  random 
process  with  PSD  Ss{f)  =  (1  —  |/|)/[_i4](/).  The  received  signal  is  given  by  y(t)  =  s(t)  +  n(t), 
where  n  is  WGN  with  PSD  Sn{f)  =  0.001.  The  received  signal  is  passed  through  an  ideal  lowpass 
filter  with  transfer  function  H{f)  = 

(a)  Find  the  SNR  (ratio  of  signal  power  to  noise  power)  at  the  filter  input. 

(b)  Is  the  SNR  at  the  filter  output  better  for  R  =  1  or  R  =  |?  Give  a  quantitative  justification 
for  your  answer. 

Problem  5.33  White  noise  n  with  PSD  ^  is  passed  through  an  RG  filter  with  impulse  response 
h{t)  =  where  Tq  is  the  RG  time  constant,  to  obtain  the  output  y  =  n*h. 

(a)  Find  the  autocorrelation  function,  PSD  and  power  of  y. 

(b)  Assuming  now  that  the  noise  is  a  Gaussian  random  process,  find  a  value  of  to  such  that 
y(to)  —  ||/(0)  is  independent  of  |/(0),  or  say  why  such  a  to  cannot  be  found. 

Problem  5.34  Find  the  noise  power  at  the  output  of  the  filter  for  the  following  two  scenarios: 

(a)  Baseband  white  noise  with  (two-sided)  PSD  ^  is  passed  through  a  filter  with  impulse 
response  h(t)  =  sinc^t. 

(b)  Passband  white  noise  with  (two-sided)  PSD  ^  is  passed  through  a  filter  with  impulse  response 
h{t)  =  sinc^t  cos  lOOvrt. 

Problem  5.35  Suppose  that  WGN  n{t)  with  PSD  =  ^  =  1  is  passed  through  a  filter  with 
impulse  response  h{t)  =  /[_i,i](t)  to  obtain  the  output  y{t)  =  (n  *  h)(t). 

(a)  Find  and  sketch  the  output  power  spectral  density  Sy{f),  carefully  labeling  the  axes. 

(b)  Specify  the  joint  distribution  of  the  three  consecutive  samples  y{l),y{2),y{3). 

(c)  Find  the  probability  that  y{l)  —  2y{2)  +  y{3)  exceeds  10. 

Problem  5.36  (computations  involving  deterministic  signal  plus  WGN)  Gonsider  the 
noisy  received  signal 

y{t)  =  s{t)  +  n{t) 

where  s{t)  =  I[o,3]{t)  and  n{t)  is  WGN  with  PSD  =  No/2  =  1/4.  The  receiver  computes  the 
following  statistics: 

Fi  =  ^  y{t)dt  ,  ^2  =  ^  y{t)dt 

(a)  Specify  the  joint  distribution  of  Yi  and  Y2. 

(b)  Gompute  the  probability  R[yi-|-l2  <  2],  expressing  it  in  terms  of  the  Q  function  with  positive 
arguments. 

Problem  5.37  (filtered  WGN)  Let  n{t)  denote  WGN  with  PSD  S'„(/)  =  We  pass  n{t) 
through  a  filter  with  impulse  response  h(t)  =  I[o,i]{t)  —  /[i,2](^)  to  obtain  z(t)  =  (n  *  h)(t). 

(a)  Find  and  sketch  the  autocorrelation  function  of  z(t). 

(b)  Specify  the  joint  distribution  of  ^(49)  and  .^(50). 

(c)  Specify  the  joint  distribution  of  z{49)  and  z{52). 

(d)  Evaluate  the  probability  P[2z{50)  >  z{4:9)  -1-  2;(51)].  Assume  =  1. 

(e)  Evaluate  the  probability  P[2z{50)  >  z{49)  +  z{51)  +  2].  Assume  =  1. 

Problem  5.38  (filtered  WGN)  Let  n{t)  denote  WGN  with  PSD  S'„(/)  =  We  pass  n{t) 
through  a  filter  with  impulse  response  h(t)  =  2J[o,2](t)  —  fii,2](t)  to  obtain  z(t)  =  (n  *  h)(t). 

(a)  Find  and  sketch  the  autocorrelation  function  of  z{t). 

(b)  Specify  the  joint  distribution  of  2;(0),  z{l),  z{2). 

(c)  Gompute  the  probability  P[z{0)  —  z{l)  -f  z{2)  >  4]  (assume  cx^  =  1). 
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Problem  5.39  (filtered  and  sampled  WGN)  Let  n{t)  denote  WGN  with  PSD  S'„(/)  = 

We  pass  n(t)  through  a  hlter  with  impulse  response  h(t)  to  obtain  z(t)  =  {n  *  h)(t),  and  then 
sample  it  at  rate  l/Tg  to  obtain  the  sequence  z[n]  =  z{nTs),  where  n  takes  integer  values. 

(a)  Show  that 

N  f 

cov {z[n],  z[m])  =  E[2r[7T,];2*  [m]]  =  h{t)h*{t  —  {n  —  m)Ts) 

(We  are  interested  in  real-valued  impulse  responses,  but  we  continue  to  develop  a  framework 
general  enough  to  encompass  comp  lex- valued  responses.) 

(b)  For  h{t)  =  /[oi](t),  specify  the  joint  distribution  of  (2;[1],  2;[2],  2r[3])^  for  a  sampling  rate  of  2 
(Ts  =  i). 

(c)  Repeat  (b)  for  a  sampling  rate  of  1. 

(d)  For  a  general  h  sampled  at  rate  l/T^,  show  that  the  noise  samples  are  independent  if  h{t)  is 
square  root  Nyquist  at  rate  l/Tg. 

Problem  5.40  Consider  the  signal  s{t)  =  /[o,2](t)  —  2/[i^3](f). 

(a)  Find  and  sketch  the  impulse  response  Smf(t)  of  the  matched  filter  for  s. 

(b)  Find  and  sketch  the  output  when  s{t)  is  passed  through  its  matched  filter. 

(c)  Suppose  that,  instead  of  the  matched  filter,  all  we  have  available  is  a  filter  with  impulse 
response  h{t)  =  J[o,i](t).  For  an  arbitrary  input  signal  x(t),  show  how  z{t)  =  {x*  Smf)(t)  can  be 
synthesized  from  y(t)  =  {x  *  h)(t). 

Problem  5.41  (Correlation  via  filtering  and  sampling)  A  signal  x{t)  is  passed  through  a 
hlter  with  impulse  response  h{t)  =  /[o,2](t)  to  obtain  an  output  y(t)  =  {x  *  h)(t). 

(a)  Find  and  sketch  a  signal  gi(t)  such  that 

1/(2)  =  {x,gi)  =  j  x{t)gi{t)dt 

(b)  Find  and  sketch  a  signal  g2{t)  such  that 

//(!)  -  2|/(1)  =  {x,g2)  =  j  x{t)g2{t)dt 

Problem  5.42  (Correlation  via  filtering  and  sampling)  Let  us  generalize  the  result  we 
were  hinting  at  in  Problem  5.41.  Suppose  an  arbitrary  signal  x  is  passed  through  an  arbitrary 
hlter  h{t)  to  obtain  output  y{t)  =  {x  *  h){t). 

(a)  Show  that  taking  a  linear  combination  of  samples  at  the  hlter  output  is  equivalent  to  a 
correlation  operation  on  u.  That  is,  show  that 

aiyiti)  =  {x,g)  = 

i=l 


J  x{t)g{t)dt 


where 

n  n 

git)  =  ^  aihiU  -t)  =  Oiihmfit  -  U)  (5.108) 

i=l  i=l 

That  is,  taking  a  linear  combination  of  samples  is  equivalent  to  correlating  against  a  signal  which 
is  a  linear  combination  of  shifted  versions  of  the  matched  hlter  for  h. 

(b)  The  preceding  result  can  be  applied  to  approximate  a  correlation  operation  by  taking  linear 
combinations  at  the  output  of  a  hlter.  Suppose  that  we  wish  to  perform  a  correlation  against  a 
triangular  pulse  g{t)  =  (1  —  |f|)/[_i4](f).  How  would  you  approximate  this  operation  by  taking  a 
linear  combination  of  samples  at  the  output  of  a  hlter  with  impulse  response  h{t)  =  /[o,i](f). 


255 


Problem  5.43  (Approximating  a  correlator  by  filtering  and  sampling)  Consider  the 
noisy  signal 

y{t)  =  sit)  +n{t) 

where  s{t)  =  (1  —  and  n{t)  is  white  noise  with  S'„(/)  =  0.1. 

(a)  Compnte  the  SNR  at  the  outpnt  of  the  integrator 


(b)  Can  yon  improve  the  SNR  by  modifying  the  integration  in  (a),  while  keeping  the  processing 
linear?  If  so,  say  how.  If  not,  say  why  not. 

(c)  Now,  snppose  that  y{t)  is  passed  throngh  a  filter  with  impulse  response  h{t)  =  /[o,i](t)  to 
obtain  z{t)  =  {y  *  h){t).  If  you  were  to  sample  the  filter  output  at  a  single  time  t  =  fo;  how 
would  you  choose  to  so  as  to  maximize  the  SNR? 

(d)  In  the  setting  of  (c),  if  you  were  now  allowed  to  take  two  samples  at  times  ti,  ^2  and  t^  and 
generate  a  linear  combination  aiz{ti)  +  022(^2)  +  ciszits),  how  would  you  choose  {0*},  {tj},  to 
improve  the  SNR  relative  to  (c).  (We  are  looking  for  intuitively  sensible  answers  rather  than  a 
provably  optimal  choice.) 

Hint:  See  Problem  5.42.  Taking  linear  combinations  of  samples  at  the  output  of  a  hlter  is 
equivalent  to  correlation  with  an  appropriate  waveform,  which  we  can  choose  to  approximate  the 
optimal  correlator. 


Mathematical  derivations 


Problem  5.44  (Bounds  on  the  Q  function)  We  derive  the  bounds  (5.117)  and  (5.116)  for 

poo  2 


Q{x)  = 


-.-t- 


/^dt 


(5.109) 


(a)  Show  that,  for  x  >  0,  the  following  upper  bound  holds: 


Q{x)  < 


Hint:  Try  pulling  out  a  factor  of  from  (5.109),  and  then  bounding  the  resulting  integrand. 
Observe  that  t  >  a;  >  0  in  the  integration  interval. 

(b)  For  a:  >  0,  derive  the  following  upper  and  lower  bounds  for  the  Q  function: 


(1 


X 


1 

- 


TTX 


<  Q{x)  < 


Hint:  Write  the  integrand  in  (5.109)  as  a  product  of  1/t  and  and  then  integrate  by  parts 

to  get  the  upper  bound.  Integrate  by  parts  once  more  using  a  similar  trick  to  get  the  lower 
bound.  Note  that  you  can  keep  integrating  by  parts  to  get  increasingly  refined  upper  and  lower 
bounds. 


Problem  5.45  (Geometric  derivation  of  Q  function  bound)  Let  Xi  and  X2  denote  inde¬ 
pendent  standard  Gaussian  random  variables. 

(a)  For  a  >  0,  express  P[|Wi|  >  a,  IA2I  >  a]  in  terms  of  the  Q  function. 

(b)  Find  P[Xf  +  A|  >  2a% 

Hint:  Transform  to  polar  coordinates.  Or  use  the  results  of  Problem  5.21. 

(c)  Sketch  the  regions  in  the  {xi,X2)  plane  corresponding  to  the  events  considered  in  (a)  and  (b). 

(d)  Use  (a)-(c)  to  obtain  an  alternative  derivation  of  the  bound  Q{x)  <  for  x  >  0  (i.e., 

the  bound  in  Problem  5.44(a)). 
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Problem  5.46  (Cauchy- Schwartz  inequality  for  random  variables)  For  random  variables 
X  and  Y  defined  on  a  common  probability  space,  define  the  mean  squared  error  in  approximating 
X  by  a  multiple  of  Y  as 

J(a)  =  E  [(X  -  aYf] 

where  a  is  a  scalar.  Assume  that  both  random  variables  are  nontrivial  (i.e.,  neither  of  them  is 
zero  with  probability  one). 

(a)  Show  that 

J(a)  =  E[X2]  +  a^E[Y^]  -  2aE[XF] 

(b)  Since  J{a)  is  quadratic  in  a,  it  has  a  global  minimum  (corresponding  to  the  best  approxima¬ 
tion  of  X  by  a  multiple  of  Y).  Show  that  this  is  achieved  for  Oopt  =  . 

(c)  Show  that  the  mean  squared  error  in  the  best  approximation  found  in  (b)  can  be  written  as 


J{aopt)  =  E[X2] 


(E[xr])^ 

E[y2] 


(d)  Since  the  approximation  error  is  nonnegative,  conclude  that 


(E[Xy])^  <  E[X^]E[y^]  Cauchy  —  Schwartz  inequality  for  random  variables  (5.110) 
This  is  the  Cauchy- Schwartz  inequality  for  random  variables. 

(e)  Conclude  also  that  equality  is  achieved  in  (5.110)  if  and  only  if  X  and  Y  are  scalar  multiples 
of  each  other. 

Hint:  Equality  corresponds  to  J{aopt)  =  0. 


Problem  5.47  (Normalized  correlation)  (a)  Apply  the  Cauchy-Schwartz  inequality  in  the 
previous  problem  to  “zero  mean”  versions  of  the  random  variables,  Xi  =  X— E[X],  Yi  =  E — E[y] 
to  obtain  that 

|cov(X,  y)|  <  \/ var(X)var(y)  (5.111) 

(b)  Conclude  that  the  normalized  correlation  p{X,Y)  defined  in  (5.59)  lies  in  [—1, 1]. 

(c)  Show  that  |p|  =  1  if  and  only  if  we  can  write  X  =  aY  -|-  b.  Specify  the  constants  a  and  b  in 
terms  of  the  means  and  covariances  associated  with  the  two  random  variables. 


Problem  5.48  (Characteristic  function  of  a  Gaussian  random  vector)  Consider  a  Gaus¬ 
sian  random  vector  X  =  (Xi,  ...,Xm)^  ~  X(m,  C).  The  characteristic  function  of  X  is  defined 
as  follows: 


(/>x(w)  =  E 


'] 


(5.112) 


The  characteristic  function  completely  characterizes  the  distribution  of  a  random  vector,  even  if  a 
density  does  not  exist.  If  the  density  does  exist,  the  characteristic  function  is  a  multidimensional 
inverse  Fourier  transform  of  it; 


0x(w)  =  E 


Px(x)  dx 


The  density  is  therefore  given  by  the  corresponding  Fourier  transform 

Px(x)  =  j  dw  (5.113) 

(a)  Show  that  Y  =  w^X  is  a  Gaussian  random  variable  with  mean  p  =  w^m  and  variance 

^2  =  w^Cw. 

(b)  For  Y  ~  N{fr,v'^),  show  that 

E[e-^^]  =  e^^  ^ 
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(c)  Use  the  result  of  (b)  to  obtain  that  the  characteristic  function  of  X  is  given  by 


0x(w) 


_  gjw^m— ^w^Cw 


(5.114) 


which  depends  only  on  m  and  C. 

(d)  Since  the  distribution  of  X  is  completely  specihed  by  its  characteristic  function,  conclude  that 
the  distribution  of  a  Gaussian  random  vector  depends  only  on  its  mean  vector  and  covariance 
matrix.  When  C  is  invertible,  we  can  compute  the  density  (5.58)  by  taking  the  Fourier  transform 
of  the  characteristic  funciton  in  (5.114),  but  we  skip  that  derivation. 


Problem  5.49  Consider  a  zero  mean  WSS  random  process  X  with  autocorrelation  function 
Rx{t).  Let  Yiit)  =  (X  *  hi){t)  and  Y2{t)  =  (X  *  h2){t)  denote  random  processes  obtained  by 
passing  X  through  LTI  systems  with  impulse  responses  hi  and  ^2,  respectively. 

(a)  Find  the  crosscorrelation  function  i?y^^y2(fi, G)- 

Hint:  You  can  use  the  approach  employed  to  obtain  (5.101),  hrst  hnding  Rv.  x  and  then  Ry.  y,. 

(b)  Are  y,  and  y  jointirWSS? 

(c)  Suppose  that  X  is  white  noise  with  PSD  Sx{f)  =  1,  hi{t)  =  /[oi](^)  and  h2{t)  =  e  */[0oo)(^)- 
Find  E[yi(0)Y2(0)]  and  E[yi(0)Y2(l)]- 


Problem  5.50  (Cauchy- Schwartz  inequality  for  signals)  Consider  two  signals  (assume 
real-valued  for  simplicity,  although  the  results  we  are  about  to  derive  apply  for  complex-valued 
signals  as  well)  u{t)  and  v{t). 

(a)  We  wish  to  approximate  u{t)  by  a  scalar  multiple  of  v{t)  so  as  to  minimize  the  norm  of  the 
error.  Specihcally,  we  wish  to  minimize 


J(a) 


u{t)  —  av(t)\‘^dt  =  ||m  —  an|p  =  {u  —  av,u  —  av) 


Show  that 


J(a)  =  I  |m|  p  -I-  a^l  |n|  p  —  2a(M,  v) 


(b)  Show  that  the  quadratic  function  J(a)  is  minimized  by  choosing  a  =  Oopt,  given  by 

_  {u,v) 

^Opt  II  110 

IfII 

Show  that  the  corresponding  approximation  aoptv  can  be  written  as  a  projection  of  u  along  a 
unit  vector  in  the  direction  of  v: 

II  I  I  /  II  II 

\\v\\  IpII 

(c)  Show  that  the  error  due  to  the  optimal  setting  is  given  by 


(d)  Since  the  minimum  error  is  non-negative,  conclude  that 

||w||||n||  <  |(M,'y)l  )  Cauchy  —  Schwartz  inequality  for  signals  (5.115) 


This  is  the  Cauchy-Schwartz  inequality,  which  applies  to  real-  and  complex-valued  signals  or 
vectors. 

(e)  Conclude  also  that  equality  in  (5.115)  occurs  if  and  only  if  n  is  a  scalar  multiple  of  n  or  if  n 
is  a  scalar  multiple  of  u.  (We  need  to  say  it  both  ways  in  case  one  of  the  signals  is  zero.) 
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Problem  5.51  Consider  a  random  process  X  passed  through  a  correlator  g  to  obtain 

/CX> 

X{t)g{t)dt 

■OO 

where  X(t),  g(t)  are  real- valued. 

(a)  Show  that  the  mean  and  variance  of  Z  can  be  expressed  in  terms  of  the  mean  function  and 
autovariance  function)  of  X  as  follows: 

E[Z]  =  [  mx{t)g{t)dt  =  {mx,g) 


var(Z)  =  /  /  Cx{ti,t2)g{ti)g{t2)dtidt2 


(b)  Suppose  now  that  X  is  zero  mean  and  WSS  with  autocorrelation  Rx{t).  Show  that  the 
variance  of  the  correlator  output  can  be  written  as 


var(Z)  =  J  RxiT)Rg{T)  dr  =  {Rx,Rg) 


where  Rg^r)  =  {g  *  gMF){T)  =  J  g{t)g(t  —  r)  dt  is  the  “autocorrelation”  of  the  waveform  g. 
Hint:  An  alternative  to  doing  this  from  scratch  is  to  use  the  equivalence  of  correlation  and 
matched  hltering.  You  can  then  employ  (5.101),  which  gives  the  output  autocorrelation  function 
when  a  WSS  process  is  sent  through  an  LTI  system,  evaluate  it  at  zero  lag  to  hnd  the  power, 
and  use  the  symmetry  of  autocorrelation  functions. 


Problems  drawing  on  material  from  Chapter  3  and  Appendix  5.E 

These  can  he  skipped  by  readers  primarily  interested  in  the  digital  communication  material  in  the 
succeeding  chapters. 


Problem  5.52  Consider  a  noisy  FM  signal  of  the  form 

v{t)  =  20  cos(27r/cf  -|-  0s (t))  -|-  n{t) 

where  n{t)  is  WGN  with  power  spectral  density  ^  =  10“®,  and  0s(t)  is  the  instantaneous  phase 
deviation  of  the  noiseless  FM  signal.  Assume  that  the  bandwidth  of  the  noiseless  FM  signal  is 
100  KHz. 

(a)  The  noisy  signal  v{t)  is  passed  through  an  ideal  BPF  which  exactly  spans  the  100  KHz 
frequency  band  occupied  by  the  noiseless  signal.  What  is  the  SNR  at  the  output  of  the  BPF? 

(b)  The  output  of  the  BPF  is  passed  through  an  ideal  phase  detector,  followed  by  a  differentiator 
which  is  normalized  to  give  unity  gain  at  10  KHz,  and  an  ideal  (unity  gain)  LPF  of  bandwidth 
10  KHz. 

(i)  Sketch  the  noise  PSD  at  the  output  of  the  differentiator. 

(ii)  Find  the  noise  power  at  the  output  of  the  LPF. 

Problem  5.53  An  FM  signal  of  bandwidth  210  KHz  is  received  at  a  power  of  -90  dBm,  and  is 
corrupted  by  bandpass  AWGN  with  two-sided  PSD  10“^^  watts/Hz.  The  message  bandwidth  is 
5  KHz,  and  the  peak-to-average  power  ratio  for  the  message  is  10  dB. 

(a)  What  is  the  SNR  (in  dB)  for  the  received  FM  signal?  (Assume  that  the  noise  is  bandlimited 
to  the  band  occupied  by  the  FM  signal.) 

(b)  Estimate  the  peak  frequency  deviation. 

(c)  The  noisy  FM  signal  is  passed  through  an  ideal  phase  detector.  Estimate  and  sketch  the 
noise  PSD  at  the  output  of  the  phase  detector,  carefully  labeling  the  axes. 

(d)  The  output  of  the  phase  detector  is  passed  through  a  differentiator  with  transfer  function 
H{f)  =  jf,  and  then  an  ideal  lowpass  hlter  of  bandwidth  5  kHz.  Estimate  the  SNR  (in  dB)  at 
the  output  of  the  lowpass  hlter. 
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Problem  5.54  A  message  signal  m{t)  is  modeled  as  a  zero  mean  random  process  with  PSD 


Smif)  =  l/|/[-2,2](/) 


We  generate  an  SSB  signal  as  follows: 

u{t)  =  20[m{t)  cos2007rt  —  rh{t)  sin2007rt] 

where  rh  denotes  the  Hilbert  transform  of  m. 

(a)  Find  the  power  of  m  and  the  power  of  u. 

(b)  The  noisy  received  signal  is  given  by  y{t)  =  u{t)  +n{t),  where  n  is  passband  AWGN  with  PSD 
^  =  1,  and  is  independent  of  u.  Draw  the  block  diagram  for  an  ideal  synchronous  demodulator 
for  extracting  the  message  m  from  y,  specifying  the  carrier  frequency  as  well  as  the  bandwidth 
of  the  LPF,  and  hnd  the  SNR  at  the  output  of  the  demodulator. 

(c)  Find  the  signal-to- noise-plus- interference  ratio  if  the  local  carrier  for  the  synchronous  demod¬ 
ulator  has  a  phase  error  of 


5. A  Q  function  bounds  and  asymptotics 


The  following  upper  and  lower  bounds  on  Q{x)  (derived  in  Problem  5.44)  are  asymptotically 
tight  for  large  arguments;  that  is,  the  difference  between  the  bounds  tends  to  zero  as  x  gets 
large. 


Bounds  on  Q{x),  asymptotically  tight  for  large  arguments 


1  - 


x^ 


3-a;2/2 


X 


<  Q(x)  < 


3-a;2/2 


X 


a:  >  0 


(5.116) 


The  asymptotic  behavior  (5.53)  follows  from  these  bounds.  However,  they  do  not  work  well  for 
small  X  (the  upper  bound  blows  up  to  oo,  and  the  lower  bound  to  —oo,  as  x  0).  The  following 
upper  bound  is  useful  for  both  small  and  large  values  of  x  >  0:  it  gives  accurate  results  for  small 
X,  and,  while  it  is  not  as  tight  as  the  bounds  (5.116)  for  large  x,  it  does  give  the  correct  exponent 
of  decay. 

Upper  bound  on  Q(x)  useful  for  both  small  and  large  arguments 

Q(x)  <  ^  >  0  (5.117) 


Figure  5.29  plots  Q(x)  and  its  bounds  for  positive  x.  A  logarithmic  scale  is  used  for  the  values 
of  the  function  in  order  to  demonstrate  the  rapid  decay  with  x.  The  bounds  (5.116)  are  seen  to 
be  tight  even  at  moderate  values  of  x  (say  x  >  2),  while  the  bound  (5.117)  shows  the  right  rate 
of  decay  for  large  x,  while  also  remaining  useful  for  small  x. 


5.B  Approximations  using  Limit  Theorems 

We  often  deal  with  sums  of  independent  (or  approximately  independent)  random  variables.  Find¬ 
ing  the  exact  distribution  of  such  sums  can  be  cumbersome.  This  is  where  limit  theorems,  which 
characterize  what  happens  to  these  sums  as  the  number  of  terms  gets  large,  come  in  handy. 

Law  of  large  numbers  (LLN):  Suppose  that  Xi,X2, ...  are  i.i.d.  random  variables  with  hnite 
mean  m.  Then  their  empirical  average  {Xi  X^jn  converges  to  their  statistical  average 
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Figure  5.29:  The  Q  function  and  bounds. 


E[Xj]  =  m  as  n  — )■  oo.  (Let  us  not  worry  about  exactly  how  convergence  is  dehned  for  a  sequence 
of  random  variables.) 

When  we  do  a  simulation  to  estimate  some  quantity  of  interest  by  averaging  over  multiple  runs, 
we  are  relying  on  the  LLN.  The  LLN  also  underlies  all  of  information  theory,  which  is  the  basis 
for  computing  performance  benchmarks  for  coded  communication  systems. 


The  LLN  tells  us  that  the  empirical  average  of  i.i.d.  random  variables  tends  to  the  statistical 
average.  The  central  limit  theorem  characterizes  the  variation  around  the  statistical  average. 


Central  limit  theorem  (CLT):  Suppose  that  Xi,X2, ...  are  i.i.d.  random  variables  with  hnite 
mean  m  and  variance  Then  the  distribution  of  Y„  =  tends  to  that  of  a  standard 

V  nv-^ 

Gaussian  random  variable.  Specihcally, 


lim  P 

n— )-oo 


Xi  +  ...  +  Xn  —  nm 


<  X 


nv^ 


<F(a;) 


(5.118) 


Notice  that  the  sum  Sn  =  Xi  + ...  +X„  has  mean  nm  and  variance  nv"^.  Thus,  the  CLT  is  telling 
us  that  Yn  a  normalized,  zero  mean,  unit  variance  version  of  Sn,  has  a  distribution  that  tends  to 
A^(0, 1)  as  n  gets  large.  In  practical  terms,  this  translates  to  using  the  CLT  to  approximate  Sn 
as  a  Gaussian  random  variable  with  mean  nm  and  variance  for  “large  enough”  n.  In  many 
scenarios,  the  CLT  kicks  in  rather  quickly,  and  the  Gaussian  approximation  works  well  for  values 
of  n  as  small  as  6-10. 


Example  5.B.1  (Gaussian  approximation  for  a  binomial  distribution)  Consider  a  bino¬ 
mial  random  variable  with  parameters  n  and  p.  We  know  that  we  can  write  it  as  Sn  =  Xi+...+Xn, 
where  W  are  i.i.d.  Bernoulh(p).  Note  that  E[W]  =  P  and  var(W)  =  p(l  —p),  so  that  Sn  has  mean 
np  and  variance  np{l  —p).  We  can  therefore  approximate  Binomial{n,p)  by  N{np,np{l  —  p)) 
according  to  the  CLT.  The  CLT  tells  us  that  we  can  approximate  the  CDF  of  a  binomial  by  a 
Gaussian:  thus,  the  integral  of  the  Gaussian  density  from  (— cxo,  k]  should  approximate  the  sum 
of  the  binomial  pmf  from  0  to  k.  The  plot  in  Figure  5.30  shows  that  the  Gaussian  density  itself 
(with  mean  np  =  6  and  variance  np{l  —  p)  =  4.2)  approximates  the  binomial  pmf  quite  well 
around  the  mean,  so  that  we  do  expect  the  corresponding  CDFs  to  be  close. 
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Figure  5.30:  A  binomial  pmf  with  parameters  u  =  20  and  p  =  0.3,  and  its  A^(6,4.2)  Gaussian 
approximation. 


5.C  Noise  Mechanisms 


We  have  discussed  mathematical  models  for  noise.  We  provide  here  some  motivation  and  physical 
feel  for  how  noise  arises. 


Thermal  Noise:  Even  in  a  resistor  that  has  no  external  voltage  applied  across  it,  the  charge 
carriers  exhibit  random  motion  because  of  thermal  agitation,  just  as  the  molecules  in  a  gas  do. 
The  amount  of  motion  depends  on  the  temperature,  and  results  in  thermal  noise.  Since  the 
charge  carriers  are  equally  likely  to  move  in  either  direction,  the  voltages  and  currents  associated 
with  thermal  noise  have  zero  DC  value.  We  therefore  quantify  the  noise  power,  or  the  average 
squared  values  of  voltages  and  currents  associated  with  the  noise.  These  were  hrst  measured  by 
Johnson,  and  then  explained  by  Nyquist  based  on  statistical  thermodynamics  arguments,  in  the 
1920s.  As  a  result,  thermal  noise  is  often  called  Johnson  noise,  or  Johns  on- Nyquist  noise. 

Using  arguments  that  we  shall  not  go  into,  Nyquist  concluded  that  the  mean  squared  value  of 
the  voltage  associated  with  a  resistor  R,  measured  in  a  small  frequency  band  [/,  /  + A/],  is  given 

by  _ 

vl{f,Af)  =  ARkTAf  (5.119) 

where  R  is  the  resistance  in  ohms,  k  =  1.38  x  10“^^  Joules/Kelvin  is  Boltzmann’s  constant,  and  T 
is  the  temperature  in  degrees  Kelvin  {TKeivin  =  Tcentigrade  +  273).  Notice  that  the  mean  squared 
voltage  depends  only  on  the  width  of  the  frequency  band,  not  its  location;  that  is,  thermal  noise 
is  white.  Actually,  a  more  accurate  statistical  mechanics  argument  does  reveal  a  dependence  on 
frequency,  as  follows: 

^RhfAf 
-  1 


where  h  =  6.63  x  10“^'^  Joules/Hz  denotes  Planck’s  constant,  which  relates  the  energy  of  a  photon 
to  the  frequency  of  the  corresponding  electromagnetic  wave  (readers  may  recall  the  famous 
formula  E  =  hu,  where  u  is  the  frequency  of  the  photon).  Now,  e’”  ~  1  +  x  for  small  x.  Using 
this  in  (5.120),  we  obtain  that  it  reduces  to  (5.119)  for  ^  <C  1  or  /  <^  kTh  =  f*.  For  T  =  290K, 
we  have  /*  ?»  6  x  10^^  Hz,  or  6  THz.  The  practical  operating  range  of  communication  frequencies 
today  is  much  less  than  this  (existing  and  emerging  systems  operate  well  below  100  GHz),  so 
that  thermal  noise  is  indeed  very  well  modeled  as  white  for  current  practice. 

For  bandwidth  B,  (5.119)  yields  the  mean  squared  voltage 


vl  =  JRkTB 
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Now,  if  we  connect  the  noise  sonrce  to  a  matched  load  of  impedance  R,  the  mean  sqnared  power 
delivered  to  the  load  is  _ 

=  kTB  (5.121) 

The  preceding  calcnlation  provides  a  valnable  benchmark,  giving  the  commnnication  link  designer 
a  ballpark  estimate  of  how  mnch  noise  power  to  expect  in  a  receiver  operating  over  a  bandwidth  B. 
Of  course,  the  noise  for  a  particular  receiver  is  typically  higher  than  this  benchmark,  and  must  be 
calculated  based  on  detailed  modeling  and  simulation  of  internal  and  external  noise  sources,  and 
the  gains,  input  impedances,  and  output  impedances  for  various  circuit  components.  However, 
while  the  circuit  designer  must  worry  about  these  details,  once  the  design  is  complete,  he  or  she 
can  supply  the  link  designer  with  a  single  number  for  the  noise  power  at  the  receiver  output, 
referred  to  the  benchmark  (5.121). 

Shot  noise:  Shot  noise  occurs  because  of  the  discrete  nature  of  the  charge  carriers.  When  a 
voltage  applied  across  a  device  causes  current  to  flow,  if  we  could  count  the  number  of  charge 
carriers  going  from  one  point  in  the  device  to  the  other  (e.g.,  from  the  source  to  the  drain  of 
a  transistor)  over  a  time  period  r,  we  would  see  a  random  number  N{t),  which  would  vary 
independently  across  disjoint  time  periods.  Under  rather  general  assumptions,  N{t)  is  well 
modeled  as  a  Poisson  random  variable  with  mean  Ar,  where  A  scales  with  the  DC  current.  The 
variance  of  a  Poisson  random  variable  equals  its  mean,  so  that  the  variance  of  the  rate  of  charge 
carrier  flow  equals 

,N{t)  1  A 

var( — —)  =  — var(iV(r))  =  - 

T  T 

We  can  think  of  this  as  the  power  of  the  shot  noise.  Thus,  increasing  the  observation  interval 
r  smooths  out  the  variations  in  charge  carrier  flow,  and  reduces  the  shot  noise  power.  If  we 
now  think  of  the  device  being  operated  over  a  bandwidth  B,  we  know  that  we  are  effectively 
observing  the  device  at  a  temporal  resolution  r  ~  Thus,  shot  noise  power  scales  linearly  with 
B. 

The  preceding  discussion  indicates  that  both  thermal  noise  and  shot  noise  are  white,  in  that 
their  power  scales  linearly  with  the  system  bandwidth  B,  independent  of  the  frequency  band  of 
operation.  We  can  therefore  model  the  aggregate  system  noise  due  to  these  two  phenomena  as  a 
single  white  noise  process.  Indeed,  both  phenomena  involve  random  motions  of  a  large  number 
of  charge  carriers,  and  can  be  analyzed  together  in  a  statistical  mechanics  framework.  This  is 
well  beyond  our  scope,  but  for  our  purpose,  we  can  simply  model  the  aggregate  system  noise  due 
to  these  phenomena  as  a  single  white  noise  process. 

Flicker  noise:  Another  commonly  encountered  form  of  noise  is  1//  noise,  also  called  flicker 
noise,  whose  power  increases  as  the  frequency  of  operation  gets  smaller.  The  sources  of  1// 
noise  are  poorly  understood,  and  white  noise  dominates  in  the  typical  operating  regimes  for 
communication  receivers.  For  example,  in  an  RF  system,  the  noise  in  the  front  end  (antenna, 
low  noise  ampliher,  mixer)  dominates  the  overall  system  noise,  and  1//  noise  is  negligible  at 
these  frequencies.  We  therefore  ignore  1//  noise  in  our  noise  modeling. 


5.D  The  structure  of  passband  random  processes 

We  discuss  here  the  modeling  of  passband  random  processes,  and  in  particular,  passband  white 
noise,  in  more  detail.  These  insights  are  useful  for  the  analysis  of  the  effect  of  noise  in  analog 
communication  systems,  as  in  Appendix  5.E. 

We  can  dehne  the  complex  envelope,  and  I  and  Q  components,  for  a  passband  random  process 
in  exactly  the  same  fashion  as  is  done  for  deterministic  signals  in  Chapter  2.  For  a  passband 
random  process,  each  sample  path  (observed  over  a  large  enough  time  window)  has  a  Fourier 
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transform  restricted  to  passband.  We  can  therefore  define  complex  envelope,  I/Q  components 
and  envelope/phase  as  we  do  for  deterministic  signals.  For  any  given  reference  freqnency  fc  in 
the  band  of  interest,  any  sample  path  Xp{t)  for  a  passband  random  process  can  be  written  as 

Xp{t)  =  Re 

Xp{t)  =  Xcit)  cos  271  fct  —  Xsit)  sin  271  fct 
Xplt)  =  e{t)  cos  {277 fct  +  6 (t)) 

where  x{t)  =  Xc{t)  +  ixs{t)  =  is  the  complex  envelope,  Xc{t),  Xs{t)  are  the  I  and  Q 

components,  respectively,  and  e{t),  6{t)  are  the  envelope  and  phase,  respectively. 

PSD  of  complex  envelope:  Applying  the  standard  frequency  domain  relationship  to  the 
time-windowed  sample  paths,  we  have  the  frequency  domain  relationship 

■Wt.(/)  =  \xTXf  -  fc)  +  lx7j-f  -  U) 


We  therefore 

\X„.T,{f)?  =  llXrSf  -  fc)?  +  \\XT,{-f  -  fc)?  =  llXrSf  -  fc)?  +  jIA'tJ-/  -  fc)? 

Dividing  by  Tg  and  letting  To  ^  oo,  we  obtain 

Scjn  =  \sM  -  fc)  +  jS.(-/  -  fc)  (5.122) 

where  S^if)  is  baseband.  Using  (5.87),  the  one-sided  passband  PSD  is  given  by 

Si(/)  =  \s,(f  -  fc)  (5.123) 

Similarly,  we  can  go  from  passband  to  complex  baseband  using  the  formula 

S^)  =  2Sf^{f  +  fc)  (5.124) 

What  about  the  I  and  Q  components?  Consider  the  complex  envelope  x{t)  =  Xc{t)  +  ixs{t).  Its 
autocorrelation  function  is  given  by 

Rx{r)  =  x{t)x*{t  -  r)  =  {xc{t)  +  jxs{t))  {xc{t  -  r)  -  jxs{t  -  r)) 


which  yields 

R^{t)  =  (Rx,(r)  +  R^,(r))  +  j  {Rx,,xAt)  -  Rx„ai«(r)) 

=  {RxX^)  +  RxX'^))+i  {Rxs,xX^)  -  Rxs,xX-^)) 
Taking  the  Fourier  transform,  we  obtain 

S.(/)  =  SccSf)  +  S..(/)  +  i  (s..,x.(/)  -  s:.,..(/)) 


(5.125) 


which  simplihes  to 


Sx{f)  =  S^Xf)  +  SxXf)  -  2lm{Sx^,xXf))  (5.126) 

For  simplicity,  we  henceforth  consider  situations  in  which  =  0  (i-®-;  1^®  1  ^^*1  Q  com¬ 

ponents  are  uncorrelated).  Actually,  for  a  given  passband  random  process,  even  if  the  I  and 
Q  components  for  a  given  frequency  reference  are  uncorrelated,  we  can  make  them  correlated 
by  shifting  the  frequency  reference.  However,  such  subtleties  are  not  required  for  our  purpose, 
which  is  to  model  digitally  modulated  signals  and  receiver  noise. 
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5.D.1  Baseband  representation  of  passband  white  noise 

Consider  passband  white  noise  as  shown  in  Figure  5.21.  If  we  choose  the  reference  frequency  as 
the  center  of  the  band,  then  we  get  a  simple  model  for  the  complex  envelope  and  the  I  and  Q 
components  of  the  noise,  as  depicted  in  Figure  5.31.  The  complex  envelope  has  PSD 

SM)  =  m,  \f\<B/2 

and  the  I  and  Q  components  have  PSDs  and  cross-spectrum  given  by 

SnSf)  =  SnAf)  =  No  ,  I/I  <5/2 

Sn.,„.(/)  =  0 


PSD  of  I  and  Q  components 


S„(f)  =Sn(f) 

“c  s 


% 

-B/2 

B/2 

PSD  of  complex  envelope 


Figure  5.31:  PSD  of  I  and  Q  components,  and  complex  envelope,  of  passband  white  noise. 


Note  that  the  power  of  the  complex  envelope  is  2NoB,  which  is  twice  the  power  of  the  correspond¬ 
ing  passband  noise  Up.  This  is  consistent  with  the  convention  in  Chapter  2  for  deterministic, 
hnite-energy  signals,  where  the  complex  envelope  has  twice  the  energy  of  the  corresponding  pass- 
band  signal.  Later,  when  we  discuss  digital  communication  receivers  and  their  performance  in 
Chapter  6,  we  hnd  it  convenient  to  scale  signals  and  noise  in  complex  baseband  such  that  we  get 
rid  of  this  factor  of  two.  In  this  case,  we  obtain  that  the  PSD  of  the  I  and  Q  components  PSDs 
are  given  by  S'„^(/)  =  5'„^(/)  =  No/2. 


Passband  White  Noise  is  Circularly  Symmetric 


np(t) 


Lowpass 

Filter 


2cos(27rf^  t-0  ) 


nb(t) 


Figure  5.32:  Circular  symmetry  implies  that  the  PSD  of  the  baseband  noise  nb{t)  is  independent 

oie. 


An  important  property  of  passband  white  noise  is  its  circular  symmetry:  the  statistics  of  the  I 
and  Q  components  are  unchanged  if  we  change  the  phase  reference.  To  understand  what  this 
means  in  practical  terms,  consider  the  downconversion  operation  shown  in  Figure  5.32,  which 


265 


yields  a  baseband  random  process  nb(t).  Circular  symmetry  corresponds  to  the  assumption  that 
the  PSD  of  rih  does  not  depend  on  6.  Thus,  it  immediately  implies  that 

SnAf)  =  SnAf)  ^  RnXr)  =  (t)  (5.127) 

since  =  ric  for  6*  =  0,  and  rib  =  ns-  for  6*  =  —  |,  where  are  the  I  and  Q  components, 

respectively,  taking  =  /q  as  a  reference.  Thus,  changes  in  phase  reference  do  not  change  the 
statistics  of  the  I  and  Q  components. 


5.E  SNR  Computations  for  Analog  Modulation 

We  now  compute  SNR  for  the  amplitude  and  angle  modulation  schemes  discussed  in  Chapter  3. 
Since  the  format  of  the  messages  is  not  restricted  in  our  analysis,  the  SNR  computations  apply 
to  digital  modulation  (where  the  messages  are  analog  waveforms  associated  with  a  particular 
sequence  of  bits  being  transmitted)  as  well  as  analog  modulation  (where  the  messages  are  typically 
“natural”  audio  or  video  waveforms  beyond  our  control).  However,  such  SNR  computations  are 
primarily  of  interest  for  analog  modulation,  since  the  performance  measure  of  interest  for  digital 
communication  systems  is  typically  probability  of  error. 


5.E.1  Noise  Model  and  SNR  Benchmark 

For  noise  modeling,  we  consider  passband,  circularly  symmetric,  white  noise  np{t)  in  a  system 
of  bandwidth  B  centered  around  fc,  with  PSD  as  shown  in  Figure  5.21.  As  discussed  in  Section 
5.D.1,  we  can  write  this  in  terms  of  its  I  and  Q  components  with  respect  to  reference  frequency 
fc  as 

Upit)  =  Hcit)  COS  271  fct  —  Usif)  sin27r/cf 
where  the  relevant  PSDs  are  given  in  Figure  5.31. 

Baseband  benchmark:  When  evaluating  the  SNR  for  various  passband  analog  modulation 
schemes,  it  is  useful  to  consider  a  hypothetical  baseband  system  as  benchmark.  Suppose  that 
a  real-valued  message  of  bandwidth  Bm  is  sent  over  a  baseband  channel.  The  noise  power  over 
the  baseband  channel  is  given  by  =  N^Bm-  If  the  received  signal  power  is  =  Pm,  then  the 
SNR  benchmark  for  this  baseband  channel  is  given  by: 

SNR,  =  (5.128) 


5.E.2  SNR  for  Amplitude  Modulation 

We  now  quickly  sketch  SNR  computations  for  some  of  the  variants  of  AM.  The  signal  and  power 
computations  are  similar  to  earlier  examples  in  this  chapter,  so  we  do  not  belabor  the  details. 

SNR  for  DSB-SC:  For  message  bandwidth  Bm,  the  bandwidth  of  the  passband  received  signal 
is  P  =  2Bm-  The  received  signal  given  by 

yp{t)  =  Ajn{t)  cos(27r/ct  9^)  +  np{t) 

where  9^  is  the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power 
is  given  by 

Pr  =  (Acm(t)  COs(27r/ct  -h  9r)f  =  AlPm/2 
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A  coherent  demodulator  extracts  the  I  component,  which  is  given  by 


yc{t)  =  Acm{t)  cos^r  +  nc(t) 


The  signal  power  is 

Fs  =  (Acm(t)  COS0r)^  =  A^Pm  COS^  6r 

while  the  noise  power  is 

Pn  =  ^  =  N^B  =  2NoBm 

so  that  the  SNR  is 


SNRdsb  — 


A^Pra  COS^  Or 
2NoBm 


COS^  Or  =  SNRh  COS^  Or 

NoBm 


(5.129) 


which  is  the  same  as  the  baseband  benchmark  (5.128)  For  ideal  coherent  demodulation  (i.e., 
Or  =  0),  we  obtain  that  the  SNR  for  DSB  equals  the  baseband  benchmark  SNRf,  in  (5.128). 

SNR  for  SSB:  For  message  bandwidth  Bm,  the  bandwidth  of  the  passband  received  signal  is 
B  =  Bm-  The  received  signal  given  by 


Vpit)  =  Acm(t)  cos(27r/ct  +  Or)  ±  Acfhit)  sin(27r/cf  +  Or)  +  Upit) 

where  Or  is  the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power 
is  given  by 


Pr  =  {Acm{t)  cos(27r/cf  +  Or))'^  +  {Acm{t)  sin(27r/cf  +  0r))‘^  =  A^Pm 
A  coherent  demodulator  extracts  the  I  component,  which  is  given  by 

ydt)  =  Acm{t)  cos  Or  T  Acfhit)  sin^^  +  ndt) 


The  signal  power  is 

Ps  =  (Acmit)  COsOr)^  =  AlPm  COS^  Or 
while  the  noise  plus  interference  power  is 


Pn  =  ^Id)  +  {Acfhit)  sin  Or)^  =  NqB  +  A^Pm  sin^  Or 


NoBm  +  A'^Pm  sin^  Or 


so  that  the  signal-to-interference-plus-noise  (SINR)  is 


Q  T  J\T  f?  _  j4^PmCOS^0r  _  PrCOS^dr 

—  NoBm+AlPmSin^  Or  ~  WoPm+PrSin^  er 
_  SNRb  COS^  Or 
1+SVPj,  sin^  9r 


(5.130) 


This  coincides  with  the  baseband  benchmark  (5.128)  for  ideal  coherent  demodulation  (i.e..  Or  = 
0).  However,  for  Or  d  0;  even  when  the  received  signal  power  Pr  gets  arbitrarily  large  relative  to 
the  noise  power,  the  SINR  cannot  be  larger  than  ,  which  shows  the  importance  of  making 
the  phase  error  as  small  as  possible. 

SNR  for  AM:  Now,  consider  conventional  AM.  While  we  would  typically  use  envelope  detection 
rather  than  coherent  demodulation  in  this  setting,  it  is  instructive  to  compute  SNR  for  both 
methods  of  demodulation.  For  message  bandwidth  Bm,  the  bandwidth  of  the  passband  received 
signal  is  R  =  2Bm-  The  received  signal  given  by 


ypit)  =  Ac  (1  +  amodmnit))  cos(27r/ct  +  Or)  +  n^t)  (5.131) 
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where  mn{t)  is  the  normalized  version  of  the  message  (with  mintm„(t)  =  —  1),  and  where  9r  is 
the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power  is  given  by 


Pr  =  {Acm{t)  cos(27r/ct  +  6r)f  =  Al{l  +  (5.132) 


where  is  the  power  of  the  normalized  message.  A  coherent  demodulator  extracts 

the  I  component,  which  is  given  by 

yc(t)  =  Ac  +  Acamodrnnit)  cos  9r  +  Ucit) 

The  power  of  the  information-bearing  part  of  the  signal  (the  DC  term  due  to  the  carrier  carries 
no  information,  and  is  typically  rejected  using  AC  coupling)  is  given  by 


Ps  =  (Acamodmnit)  COs9rf  =  Alttl^^d^nir,  COS^  (5.133) 


Recall  that  the  AM  power  efficiency  is  dehned  as  the  power  of  the  message-bearing  part  of  the 
signal  to  the  power  of  the  overall  signal  (which  includes  an  unmodulated  carrier),  and  is  given 
by 


Vam  = 


^  A  Q'modPrrin 


We  can  therefore  write  the  signal  power  (5.133)  at  the  output  of  the  coherent  demodulator  in 
terms  of  the  received  power  in  (5.132)  as: 


Ps  =  2PrrjAM  COS^  9r 


while  the  noise  power  is 

Pn  =  nl{t)  =  NqB  =  2NoBjn 

Thus,  the  SNR  is 


SNRAM,coh  =  ^  =  =  SNR,r]AM  cos^  9^  (5.134) 

Thus,  even  with  ideal  coherent  demodulation  {9r  =  0),  the  SNR  obtained  is  AM  is  less  than  that 
of  the  baseband  benchmark,  since  tjam  <  1  (typically  much  smaller  than  one).  Of  course,  the 
reason  we  incur  this  power  inefficiency  is  to  simplify  the  receiver,  by  message  recovery  using  an 
envelope  detector.  Let  us  now  compute  the  SNR  for  the  latter. 


Figure  5.33:  At  high  SNR,  the  envelope  of  an  AM  signal  is  approximately  equal  to  its  I  component 
relative  to  the  received  carrier  phase  reference. 


Expressing  the  passband  noise  in  the  received  signal  (5.131)  with  the  incoming  carrier  as  the 
reference,  we  have 

yp{t)  =  Ac{l  +  amodrrinit))  cos(27r/cf  9,.)  +  ndt)  cos(27r/cf  -h  9r)  -  ns{t)  sin(27r/ct  -h  9r) 
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where,  by  virtue  of  circular  symmetry,  nc,ns  have  the  PSDs  and  cross-spectra  as  in  Figure  5.31, 
regardless  of  9r.  That  is, 

Vpii)  =  Veit)  cos(27r/ct  Or)  -  Vsit)  sin(27r/ct  -h  Or) 

where,  as  shown  in  Figure  5.33, 

ydt)  =  Ac{l  +  arr,odmn{t))  +  nr{t)  ,  ys{t)  =  n,{t) 

At  high  SNR,  the  signal  term  is  dominant,  so  that  ydt)  ys{t)-  Furthermore,  since  the  AM 
signal  is  positive  (assuming  ttmod  <  1),  so  that  yc>  0  “most  of  the  time,”  even  though  rif.  can  be 
negative.  We  therefore  obtain  that 

e(i)  =  Vycit)  +  yKt)  -  \yc{t)\  ~  ydt) 

That  is,  the  output  of  the  envelope  detector  is  approximated,  for  high  SNR,  as 

e{t)  Ac  (1  +  amodrrinit))  +  ndt) 

The  right-hand  side  is  what  we  would  get  from  ideal  coherent  detection.  We  can  reuse  our  SNR 
computation  for  coherent  detection  to  conclude  that  the  SNR  at  the  envelope  detector  output  is 
given  by 

SNRAM,envdet  =  SNEhTjAM  (5.135) 

Thus,  for  a  properly  designed  {amod  <  1)  AM  system  operating  at  high  SNR,  the  envelope 
detector  approximates  the  performance  of  ideal  coherent  detection,  without  requiring  carrier 
synchronization. 


5.E.3  SNR  for  Angle  Modulation 

We  have  seen  how  to  compute  SNR  when  white  noise  adds  to  a  message  encoded  in  the  signal 
amplitude.  Let  us  now  see  what  happens  when  the  message  is  encoded  in  the  signal  phase  or 
frequency.  The  received  signal  is  given  by 


yp(t)  =  AcCos(27r/ct  0(t))  +  np{t)  (5.136) 

where  np{t)  is  passband  white  noise  with  one-sided  PSD  Nq  over  the  signal  band  of  interest,  and 
where  the  message  is  encoded  in  the  phase  0{t).  For  example. 


0{t)  =  kpUiit) 


for  phase  modulation,  and 


1  d 
2tt  dt 


0(t)  =  kfiTiit) 


for  frequency  modulation.  We  wish  to  understand  how  the  additive  noise  np{t)  perturbs  the 
phase. 

Decomposing  the  passband  noise  into  I  and  Q  components  with  respect  to  the  phase  of  the 
noiseless  angle  modulated  signal,  we  can  rewrite  the  received  signal  as  follows: 


yp{t)  =  AcCos(27r/ct  0{t))  +  ndt)  cos(27r/ct  0{t))  -  ndt)  sin(27r/ct  0{t)) 
=  (Ac  -f  ndt))  cos(27r/ct  0{t))  -  ndt)  sin(27r/ct  -h  0{t)) 


(5.137) 
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Figure  5.34:  I  and  Q  components  of  a  noisy  angle  modulated  signal  with  the  phase  reference 
chosen  as  the  phase  of  the  noiseless  signal. 


where  Uc,  have  PSDs  as  in  Figure  5.31  (with  cross-spectrum  Sn^^ndf)  =  0);  thanks  to  circular 
symmetry  (we  assume  that  it  applies  approximately  even  though  the  phase  reference  6{t)  is  time- 
varying).  The  I  and  Q  components  with  respect  to  this  phase  reference  are  shown  in  Figure  5.34, 
so  that  the  corresponding  complex  envelope  can  be  written  as 

y{t)  = 


where 


and 


e{t)  =  ^  {Ac  +  nc{t)f  + 
Onit)  =  tan-i 


Ac  +  nc(t) 

The  passband  signal  in  (5.137)  can  now  be  rewritten  as 


(5.138) 

(5.139) 


yp(t)  =  Re(2/(t)e2"^=*+®('))  =  Re  =  e(t)  cos  {27r fct  +  e{t)  +  0„(f)) 

At  high  SNR,  Ac  ^  \nc\  and  Ac^  |?7.s|.  Thus, 

I  « 1 

Ac  -f  nc{t) 

and 

Usjt)  ^  risjt) 

Ac  +  nc(t)  Ac 

For  |x|  small,  tanx  ~  x,  and  hence  x  ~  tan“^  x.  We  therefore  obtain  the  following  high  SNR 
approximation  for  the  phase  perturbation  due  to  the  noise: 


0n(i)  =  tan 


-1 


ns(t)  ns(t) 


Ac  +  nc(t)  Ac 

To  summarize,  we  can  model  the  received  signal  (5.136)  as 

ns(t). 


high  SNR  approximation 


(5.140) 


yp(t)  ^  AcCos(27r/ct  0(t)  + 


A. 


high  SNR  approximation 


(5.141) 


Thus,  the  Q  component  (relative  to  the  desired  signal’s  phase  reference)  of  the  passband  white 
noise  appears  as  phase  noise,  but  is  scaled  down  by  the  signal  amplitude. 


270 


FM  Noise  Analysis 

Let  us  apply  the  preceding  to  develop  an  analysis  of  the  effects  of  white  noise  on  FM.  It  is 
helpful,  but  not  essential,  to  have  read  Chapter  3  for  this  discussion.  Suppose  that  we  have  an 
ideal  detector  for  the  phase  of  the  noisy  signal  in  (5.141),  and  that  we  differentiate  it  to  recover 
a  message  encoded  in  the  frequency.  (For  those  who  have  read  Chapter  3,  we  are  talking  about 
an  ideal  limiter-discriminator).  The  output  is  the  instantaneous  frequency  deviation,  given  by 

z{t)  =  {e{t)  +  e^t))  ^  kMt)  +  ^  (5.142) 

using  the  high  SNR  approximation  (5.140). 


Figure  5.35:  Block  diagram  for  FM  system  using  limiter-discriminator  demodulation. 


PSD  of  noiseless  FM  signal 


Before  limiter-discriminator 


®RF 


After  limiter-discriminator 


Figure  5.36:  PSDs  of  signal  and  noise  before  and  after  limiter-discriminator. 


We  now  analyze  the  performance  of  an  FM  system  whose  block  diagram  is  shown  in  Figure 
5.35.  For  wideband  FM,  the  bandwidth  Bjif  of  the  received  signal  yp(t)  is  signihcantly  larger 
than  the  message  bandwidth  Bm'-  B^p  ~  2(/3  -1-  l)Bm  by  Carson’s  formula,  where  /5  >  1.  Thus, 
the  RF  front  end  in  Figure  5.35  lets  in  passband  white  noise  np{t)  of  bandwidth  of  the  order 


271 


of  -BijF,  as  shown  in  Fignre  5.36.  Figure  5.36  also  shows  the  PSDs  once  we  have  passed  the 
received  signal  through  the  limiter-discriminator.  The  estimated  message  at  the  output  of  the 
limiter-discriminator  is  a  baseband  signal  which  we  can  limit  to  the  message  bandwidth  Bm, 
which  signihcantly  reduces  the  noise  that  we  see  at  the  output  of  the  limiter-discriminator.  Let 
us  now  compute  the  output  SNR.  From  (5.142),  the  signal  power  is  given  by 


F,  =  (fc/m(t))2  =  k)Pm  (5.143) 

The  noise  contribution  at  the  output  is  given  by 


Znif) 
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Since  d/dt  -H-  j27if,  Zn(t)  is  obtained  by  passing  Usit)  through  an  LTI  system  with  G{f)  = 
=  jf  /Ac-  Thus,  the  noise  PSD  at  the  output  of  the  limiter-discriminator  is  given  by 

S.M)  =  \GU)?SnAf)  =  fNo/Al  ,  I/I  <  BnF/2  (5.144) 


Once  we  limit 
power  is  given 


the  bandwidth  to  the  message  bandwidth  Bm  after  the  discriminator,  the  noise 

by 


Pn 


"Br, 


'  —Br, 


SMdf 


r-  pNo  2BlNo 
J-Br.  P  ^  3R2 


(5.145) 


From  (5.143)  and  (5.145),  we  obtain  that  the  SNR  is  given  by 


F,  ^pPmAl 

SNRpm  =  ^=  ^  " 


P„ 


2BtNn 


It  is  interesting  to  benchmark  this  against  a  baseband  communication  system  in  which  the 
message  is  sent  directly  over  the  channel.  To  keep  the  comparison  fair,  we  £x  the  received  power 
to  that  of  the  passband  system  and  the  one-sided  noise  PSD  to  that  of  the  passband  white  noise. 
Thus,  the  received  signal  power  is  Pr  =  Al/2,  and  the  noise  power  is  NoBm,  and  the  baseband 
benchmark  SNR  is  given  by 


SNRb 


Pr 

NoBm 


2NQBm 


We  therefore  obtain  that 


3k‘lPm 

SNRfm  =  -^^SNR, 


(5.146) 


Let  us  now  express  this  in  terms  of  some  interesting  parameters.  The  maximum  frequency 
deviation  in  the  FM  system  is  given  by 


A/max  =  fc/maxt|m(t)| 


and  the  modulation  index  is  dehned  as  the  ratio  between  the  maximum  frequency  deviation  and 
the  message  bandwidth: 


/S 


A/„ 


Br, 


Thus,  we  have 


^  P,n  ^ 
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defining  the  peak-to-average  power  ratio  (PAR)  of  the  message  as 


^  (max,|m(t)|)^ 
w?{t) 


(maxt|m(t)|)^ 

Pm 


Substituting  into  (5.146),  we  obtain  that 

3/52 

SNRfm  =  (5.147) 

Thus,  FM  can  improve  upon  the  baseband  benchmark  by  increasing  the  modulation  index  (]. 
This  is  an  example  of  a  power-bandwidth  tradeoff:  by  increasing  the  bandwidth  beyond  that 
strictly  necessary  for  sending  the  message,  we  have  managed  to  improve  the  SNR  compared  to  the 
baseband  benchmark.  However,  the  quadratic  power-bandwidth  tradeoff  offered  by  FM  is  highly 
suboptimal  compared  to  the  best  possible  tradeoffs  in  digital  communication  systems,  where 
one  can  achieve  exponential  tradeoffs.  Another  drawback  of  the  FM  power-bandwidth  tradeoff 
is  that  the  amount  of  SNR  improvement  depends  on  the  PAR  of  the  message:  messages  with 
larger  dynamic  range,  and  hence  larger  PAR,  will  see  less  improvement.  This  is  in  contrast  to 
digital  communication,  where  message  characteristics  do  not  affect  the  power-bandwidth  tradeoffs 
over  the  communication  link,  since  messages  are  converted  to  bits  via  source  coding  before 
transmission.  Of  course,  messages  with  larger  dynamic  range  may  well  require  more  bits  to 
represent  them  accurately,  and  hence  a  higher  rate  on  the  communication  link,  but  such  design 
choices  are  decoupled  from  the  parameters  governing  reliable  link  operation. 

Threshold  effect:  It  appears  from  (5.147)  that  the  output  SNR  can  be  improved  simply  by 
increasing  (3.  This  is  somewhat  misleading.  For  a  given  message  bandwidth  Bm,  increasing  f3 
corresponds  to  increasing  the  RF  bandwidth:  Brf  ~  2(/3  -|-  l)Bm  by  Carson’s  formula.  Thus, 
an  increase  in  (3  corresponds  to  an  increase  in  the  power  of  the  the  passband  white  noise  at 
the  input  of  the  limiter-discriminator,  which  is  given  by  NqBff  =  2NoBm{(3  +  1).  Thus,  if  we 
increase  /3,  the  high  SNR  approximation  underlying  (5.140),  and  hence  the  model  (5.142)  for  the 
output  of  the  limiter-discriminator,  breaks  down.  It  is  easy  to  see  this  from  the  equation  (5.139) 
for  the  phase  perturbation  due  to  noise:  0n{t)  =  tan“^  ^  +n •  When  Ac  is  small,  variations  in 

nc{t)  can  change  the  sign  of  the  denominator,  which  leads  to  phase  changes  of  tt,  over  a  small 
time  interval.  This  leads  to  impulses  in  the  output  of  the  discriminator.  Indeed,  as  we  start 
reducing  the  SNR  at  the  input  to  the  discriminator  for  FM  audio  below  the  threshold  where  the 
approximation  (5.140)  holds,  we  can  actually  hear  these  peaks  as  “clicks”  in  the  audio  output. 
As  we  reduce  the  SNR  further,  the  clicks  swamp  out  the  desired  signal.  This  is  called  the  FM 
threshold  effect. 

To  avoid  this  behavior,  we  must  operate  in  the  high-SNR  regime  where  Ac  ^  |nc|,  In^l,  so  that 
the  approximation  (5.140)  holds.  In  other  words,  the  SNR  for  the  passband  signal  at  the  input 
to  the  limiter-discriminator  must  be  above  a  threshold,  say  7  (e.g.,  7  =  10  might  be  a  good  rule 
of  thumb),  for  FM  demodulation  to  work  well.  This  condition  can  be  expressed  as  follows: 


Pr 

NqBrf 


>  7 


(5.148) 


Thus,  in  order  to  utilize  a  large  RF  bandwidth  to  improve  SNR  at  the  output  of  the  limiter- 
discriminator,  the  received  signal  power  must  also  scale  with  the  available  bandwidth.  Using 
Carson’s  formula,  we  can  rewrite  (5.148)  in  terms  of  the  baseband  benchmark  as  follows: 


SNRh  =  >  2'y{(3  +  1)  ,  condition  for  operation  above  threshold  (5.149) 

NoBm 

To  summarize,  the  power-bandwidth  tradeoff  (5.147)  applies  only  when  the  received  power  (or 
equivalently,  the  baseband  benchmark  SNR)  is  above  a  threshold  that  scales  with  the  bandwidth, 
as  specihed  by  (5.149). 
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Preemphasis  and  Deemphasis 


Since  the  noise  at  the  limiter-discriminator  output  has  a  quadratic  PSD  (see  (5.144)  and  Figure 
5.36),  higher  frequencies  in  the  message  see  more  noise  than  lower  frequencies.  A  commonly 
used  approach  to  alleviate  this  problem  is  to  boost  the  power  of  the  higher  message  frequencies 
at  the  transmitter  by  using  a  highpass  preemphasis  hlter.  The  distortion  in  the  message  due  to 
preemphasis  is  undone  at  the  receiver  using  a  lowpass  deemphasis  hlter,  which  attenuates  the 
higher  frequencies.  The  block  diagram  of  an  FM  system  using  such  an  approach  is  shown  in 
Figure  5.37. 


Figure  5.37:  Preemphasis  and  deemphasis  in  FM  systems. 


A  typical  choice  for  the  preemphasis  hlter  is  a  highpass  hlter  with  a  single  zero,  with  transfer 
function  of  the  form 

HpEif)  =  1  +  j27r/ri 

The  corresponding  deemphasis  hlter  is  a  single  pole  lowpass  hlter  with  transfer  function 


HoEif) 


1 

1  -h  j27r/ri 


For  FM  audio  broadcast,  Ti  is  chosen  in  the  range  50-75  /is  (e.g.,  75  /is  in  the  United  States,  50  ps 
in  Europe).  The  noise  scaling  at  the  output  of  the  limiter-discriminator  is  compensated  by  the 
(approximately)  1/ P  scaling  provided  by  |i7£)E(/)p  beyond  the  cutoh  frequency  fpd  = 

(the  subscript  indicates  the  use  of  preemphasis  and  deemphasis),  which  evaluates  to  2.1  KHz  for 
Ti  =  75  /is. 

Let  us  compute  the  SNR  improvement  obtained  using  this  strategy.  Assuming  that  the  pre¬ 
emphasis  and  deemphasis  hlters  compensate  each  other  exactly,  the  signal  contribution  to  the 
estimated  message  at  the  output  of  the  deemphasis  hlter  in  Figure  5.37  is  kfm(t),  which  equals 
the  signal  contribution  to  the  estimated  message  at  the  output  of  the  limiter-discriminator  in 
Figure  5.35,  which  shows  a  system  not  using  preemphasis/deemphasis.  Since  the  signal  contri¬ 
butions  in  the  estimated  messages  in  both  systems  are  the  same,  any  improvement  in  SNR  must 
come  from  a  reduction  in  the  output  noise.  Thus,  we  wish  to  characterize  the  noise  PSD  and 
power  at  the  output  of  the  deemphasis  hlter  in  Figure  5.37.  To  do  this,  note  that  the  noise  at 
the  output  of  the  limiter-discriminator  is  the  same  as  before: 


Zn{t) 


2eAc 


with  PSD 

S.M)  =  \G{fpSnAf)  =  fNo/Al  ,  I/I  <  BnE/2 
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The  noise  Vn  obtained  by  passing  Zn  throngh  the  deemphasis  hlter  has  PSD 


S.Sf)  =  \HDE{f)?S.M) 


No  P 
4^  1  +  {f/u,y 


4  V  i  +  (//4)V 


Integrating  over  the  message  bandwidth,  we  hnd  that  the  noise  power  in  the  estimated  message 
in  Fignre  5.37  is  given  by 


"Br, 


' -B„ 


S.M)df 


‘^NopdfB^  _^B^\ 

”4'  4 


(5.150) 


where  we  have  used  the  substitution  tanx  =  f  / fpd  to  evaluate  the  integral.  As  we  have  already 
mentioned,  the  signal  power  is  unchanged  from  the  earlier  analysis,  so  that  the  improvement  in 
SNR  is  given  by  the  reduction  in  noise  power  compared  with  (5.145),  which  gives 


SNRgain  = 


2Noy 


pd 


2BlNo 

3^2 


4^  -  tan“^  4^ 

Jpd  Jpd 


fpd  J 


4^  -  tan"^  4^ 

Jpd  Jpd 


(5.151) 


For  fpd  =  2.1  KHz,  corresponding  to  the  United  States  guidelines  for  FM  audio  broadcast,  and 
an  audio  bandwidth  Bm,  =  15  KHz,  the  SNR  gain  in  (5.151)  evaluates  to  more  than  13  dB. 


For  completeness,  we  give  the  formula  for  the  SNR  obtained  using  preemphasis  and  deemphasis 
as 


SNR 


FM,pd 


Brn\ 

fpd  J 


4^  -  tan“^  4^ )  -FAR 

Jpd  Jpd  J 


SNRh 


(5.152) 


which  is  obtained  by  taking  the  product  of  the  SNR  gain  (5.151)  and  the  SNR  without  preem¬ 
phasis/deemphasis  given  by  (5.147). 


275 


276 


Chapter  6 

Optimal  Demodulation 


As  we  saw  in  Chapter  4,  we  can  send  bits  over  a  channel  by  choosing  one  of  a  set  of  waveforms  to 
send.  For  example,  when  sending  a  single  16QAM  symbol,  we  are  choosing  one  of  16  passband 
waveforms: 

=  bcP{t)  cos27r/cf  -  bsp(t)  sm27i f^t 

where  be,  bg  each  take  values  in  {±1,±3}.  We  are  thus  able  to  transmit  logg  16  =  4  bits  of 
information.  In  this  chapter,  we  establish  a  framework  for  recovering  these  4  bits  when  the 
received  waveform  is  a  noisy  version  of  the  transmitted  waveform.  More  generally,  we  consider 
the  fundamental  problem  of  M-ary  signaling  in  additive  white  Gaussian  noise  (AWGN):  one  of 
M  signals,  si(t), SM(t)  is  sent,  and  the  received  signal  equals  the  transmitted  signal  plus  white 
Gaussian  noise  (WGN). 

At  the  receiver,  we  are  faced  with  a  hypothesis  testing  problem:  we  have  M  possible  hypotheses 
about  which  signal  was  sent,  and  we  have  to  make  our  “best”  guess  as  to  which  one  holds,  based 
on  our  observation  of  the  received  signal.  We  are  interested  in  finding  a  guessing  strategy,  more 
formally  termed  a  decision  rule,  which  is  the  “best”  according  to  some  criterion.  For  communi¬ 
cations  applications,  we  are  typically  interested  in  finding  a  decision  rule  which  minimizes  the 
probability  of  error  (i.e.,  the  probability  of  making  a  wrong  guess).  We  can  now  summarize  the 
goals  of  this  chapter  as  follows. 

Goals:  We  wish  to  design  optimal  receivers  when  the  received  signal  is  modeled  as  follows: 

Hi  :  y{t)  =  Si{t)  +  n{t)  ,  i  =  l,,,.M 

where  Hi  is  the  ith  hypothesis,  corresponding  to  signal  Si{t)  being  transmitted,  and  where  n{t) 
is  white  Gaussian  noise.  We  then  wish  to  analyze  the  performance  of  such  receivers,  to  see  how 
performance  measures  such  as  the  probability  of  error  depend  on  system  parameters.  It  turns 
out  that,  for  the  preceding  AWGN  model,  the  performance  depends  only  on  the  received  signal- 
to-noise  ratio  (SNR)  and  on  the  “shape”  of  the  signal  constellation  {si(t), ...,  SM(t)}.  Underlying 
both  the  derivation  of  the  optimal  receiver  and  its  analysis  is  a  geometric  view  of  signals  and 
noise  as  vectors,  which  we  term  signal  space  concepts.  Once  we  have  this  background,  we  are  in 
a  position  to  discuss  elementary  power-bandwidth  tradeoffs.  For  example,  16QAM  has  higher 
bandwidth  efficiency  than  QPSK,  so  it  makes  sense  that  it  has  lower  power  efficiency;  that  is,  it 
requires  higher  SNR,  and  hence  higher  transmit  power,  for  the  same  probability  of  error.  We  will 
be  able  to  quantify  this  intuition,  previewed  in  Ghapter  4,  based  on  the  material  in  this  chapter. 
We  will  also  be  able  to  perform  link  budget  calculations:  for  example,  how  much  transmit  power 
is  needed  to  attain  a  given  bit  rate  using  a  given  constellation  as  a  function  of  range,  and  transmit 
and  receive  antenna  gains? 

Chapter  Plan:  The  prerequisites  for  this  chapter  are  Ghapter  4  (digital  modulation)  and  the 
material  on  Gaussian  random  variables  (Section  5.6)  and  noise  modeling  (Section  5.8)  in  Ghap¬ 
ter  5.  We  build  up  the  remaining  background  required  to  attain  our  goals  in  this  chapter  in  a 
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step-by-step  fashion,  as  follows. 

Hypothesis  testing:  In  Section  6.1,  we  establish  the  basic  framework  for  hypothesis  testing,  de¬ 
rive  the  form  of  optimal  decision  rnles,  and  illnstrate  the  application  of  this  framework  for 
hnite-dimensional  observations. 

Signal  space  concepts:  In  Section  6.2,  we  show  that  continnons  time  M-ary  signaling  in  AWGN 
can  be  rednced  to  an  eqnivalent  hnite-dimensional  system,  in  which  transmitted  signal  vectors 
are  corrnpted  by  vector  WGN.  This  is  done  by  projecting  the  continnons  time  signal  into  the 
hnite-dimensional  signal  space  spanned  by  the  set  of  possible  transmitted  signals,  Si,  We 

apply  the  hypothesis  testing  framework  to  derive  the  optimal  receiver  for  the  hnite-dimensional 
system,  and  from  this  we  infer  the  optimal  receiver  in  continnons  time. 

Performance  analysis:  In  Section  6.3,  we  analyze  the  performance  of  optimal  reception.  We  show 
that  performance  depends  only  on  SNR  and  the  relative  geometry  of  the  signal  constellation.  We 
provide  exact  error  probability  expressions  for  binary  signaling.  While  the  probability  of  error  for 
larger  signal  constellations  mnst  typically  be  compnted  by  simnlation  or  nnmerical  integration, 
we  obtain  bonnds  and  approximations,  bnilding  on  the  analysis  for  binary  signaling,  that  provide 
qnick  insight  into  power-bandwidth  tradeohs. 

Link  budget  analysis:  In  Section  6.5,  we  illnstrate  how  performance  analysis  is  applied  to  obtain¬ 
ing  the  “link  bndget”  for  a  typical  radio  link,  which  is  the  tool  nsed  to  obtain  coarse  guidelines 
for  the  design  of  hardware,  including  transmit  power,  transmit  and  receive  antennas,  and  receiver 
noise  hgure. 

Notational  shortcut:  In  this  chapter,  we  make  extensive  use  of  the  notational  simplihcation 
discussed  at  the  end  of  Section  5.3.  Given  a  random  variable  X,  a  common  notation  for  probabil¬ 
ity  density  function  or  probability  mass  function  is  px{x),  with  X  denoting  the  random  variable, 
and  X  being  a  dummy  variable  which  we  might  integrate  out  when  computing  probabilities. 
However,  when  there  is  no  scope  for  confusion,  we  use  the  less  cumbersome  (albeit  incomplete) 
notation  p{x),  using  the  dummy  variable  x  not  only  as  the  argument  of  the  density,  but  also 
to  indicate  that  the  density  corresponds  to  the  random  variable  X.  (Similarly,  we  would  use 
p{y)  to  denote  the  density  for  a  random  variable  Y.)  The  same  convention  is  used  for  joint  and 
conditional  densities  as  well.  For  random  variables  X  and  Y,  we  use  the  notation  p{x,y)  in¬ 
stead  of  Px,y{x,  y),  and  p{y\x)  instead  of  PY\x{y\x),  to  denote  the  joint  and  conditional  densities, 
respectively. 


6.1  Hypothesis  Testing 


In  Example  5.6.3,  we  considered  a  simple  model  for  binary  signaling,  in  which  the  receiver  sees 
a  single  sample  Y.  If  0  is  sent,  the  conditional  distribution  of  Y  is  iV(0,n^),  while  if  1  is  sent, 
the  conditional  distribution  is  iV(m,n^).  We  analyzed  a  simple  decision  rule  in  which  we  guess 
that  0  is  sent  if  F  <  m/2,  and  guess  that  1  is  sent  otherwise.  Thus,  we  wish  to  decide  between 
two  hypotheses  (0  being  sent  or  1  being  sent)  based  on  an  observation  (the  received  sample  Y). 
The  statistics  of  the  observation  depend  on  the  hypothesis  (this  information  is  captured  by  the 
conditional  distributions  of  Y  given  each  hypotheses).  We  must  now  make  a  good  guess  as  to 
which  hypothesis  is  true,  based  on  the  value  of  the  observation.  The  guessing  strategy  is  called 
the  decision  rule,  which  maps  each  possible  value  of  Y  to  either  0  or  1. 

The  decision  rule  we  have  considered  in  Example  5.6.3  makes  sense,  splitting  the  difference 
between  the  conditional  means  of  Y  under  the  two  hypotheses.  But  is  it  always  the  best  thing 
to  do?  For  example,  if  we  know  for  sure  that  0  is  sent,  then  we  should  clearly  always  guess  that 
0  is  sent,  regardless  of  the  value  of  Y  that  we  see.  As  another  example,  if  the  noise  variance  is 
different  under  the  two  hypotheses,  then  it  is  no  longer  clear  that  splitting  the  difference  between 
the  means  is  the  right  thing  to  do.  We  therefore  need  a  systematic  framework  for  hypothesis 
testing,  which  allows  us  to  derive  good  decision  rules  for  a  variety  of  statistical  models. 
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In  this  section,  we  consider  the  general  problem  of  M-ary  hypothesis  testing,  in  which  we  must 
decide  which  of  M  possible  hypotheses,  Hq,  Hm-i,  “best  explains”  an  observation  Y.  For 
our  purpose,  the  observation  Y  can  be  a  scalar  or  vector,  and  takes  values  in  an  observation 
space  r.  The  link  between  the  hypotheses  and  observation  is  statistical:  for  each  hypothesis 
Hi,  we  know  the  conditional  distribution  of  Y  given  Hi.  We  denote  the  conditional  density 
of  Y  given  Hi  as  p{y\i),  z  =  0, 1,  ...,M  —  1.  We  may  also  know  the  prior  probabilities  of  the 
hypotheses  (i.e.,  the  probabillity  of  each  hypothesis  prior  to  seeing  the  observation),  denoted  by 
TTj  =  P[Hi],  i  =  0, 1, ...,  M  —  1,  which  satisfy  tt=1.  The  hnal  ingredient  of  the  hypothesis 

testing  framework  is  the  decision  rule:  for  each  possible  value  Y  =  y  oi  the  observation,  we  must 
decide  which  of  the  M  hypotheses  we  will  bet  on.  Denoting  this  guess  as  6{y),  the  decision  rule 
(5(-)  is  a  mapping  from  the  observation  space  T  to  {0, 1, ...,  M  —  1},  where  5{y)  =  i  means  that 
we  guess  that  Hi  is  true  when  we  see  Y  =  y.  The  decision  rule  partitions  the  observation  space 
into  decision  regions,  with  T,  denoting  the  set  of  values  of  Y  for  which  we  guess  Hi.  That  is, 
Tj  =  {i/GT  :  5{y)  =  i},  i  =  0, 1,  ...,M  —  1.  We  summarize  these  ingredients  of  the  hypothesis 
testing  framework  as  follows. 

Ingredients  of  hypothesis  testing  framework 

•  Hypotheses  Hq,  Hi,  ...,  Hm-i 

•  Observation  H  G  T 

•  Conditional  densities  p{y\i),  for  i  =  0, 1, ...,  M  —  1 

•  Prior  probabilities  tt*  =  P[Hi\,  i  =  0, 1, ...,  M  —  1,  with  tt*  =  1 

•  Decision  rule  <5  :  T  ^  {0, 1, ...,  M  —  1} 

•  Decision  regions  T,  =  {?/  G  T  :  6{y)  =  i},  i  =  0,1, ...,  M  —  1 


To  make  the  concepts  concrete,  let  us  quickly  recall  Example  5.6.3,  where  we  have  M  =  2 
hypotheses,  with  Hq  :  Y  ~  N{0,v‘^)  and  TTi  ;  D  ~  N{m,v‘^).  The  “sensible”  decision  rule  in  this 
example  can  be  written  as 


y  <  m/2 
y  >  m/2 


so  that  To  =  (— oo,m/2]  and  Ti  =  (m/2,  oo).  Note  that  this  decision  rule  need  not  be  optimal 
if  we  know  the  prior  probabilities.  For  example,  if  we  know  that  tto  =  1,  we  should  say  that  Hq 
is  true,  regardless  of  the  value  of  Y :  this  would  reduce  the  probability  of  error  from  Q  (for 
the  “sensible”  rule)  to  zero! 


6.1.1  Error  probabilities 

The  performance  measures  of  interest  to  us  when  choosing  a  decision  rule  are  the  conditional 
error  probabilities  and  the  average  error  probability.  We  have  already  seen  these  in  Example 
5.6.3  for  binary  on-off  keying,  but  we  now  formally  dehne  them  for  a  general  M-ary  hypothesis 
testing  problem.  For  a  hxed  decision  rule  6  with  corresponding  decision  regions  {Fj},  we  dehne 
the  conditional  probabilities  of  error  as  follows. 

Conditional  Error  Probabilities:  The  conditional  error  probability,  conditioned  on  Hi,  where 
0  <  i  <  M  —  1,  is  dehned  as 

Pe\i  =  P[say  Hj  for  some  j  ^  i\Hi  is  true]  =  P[Y  G  Tj\Hi\  =  1  —  P[Y  G  F/Mj]  (6.1) 

Conditional  Probabilities  of  Correct  Decision:  These  are  dehned  as 

Pc\i  =  1  -  Pe\i  =  P[Y  e  Ti\Hi]  (6.2) 
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Average  Error  Probability:  This  is  given  by  averaging  the  conditional  error  probabilities 
using  the  priors: 

M 

Pe  =  ^T^iPe\i  (6.3) 

i=l 

Average  Probability  of  Correct  Decision:  This  is  given  by 

M 

Pc  =  ^^  '^iPc\i  =  ^-  Pe  (6.4) 

i=l 


6.1.2  ML  and  MAP  decision  rules 


For  a  general  M-ary  hypothesis  testing  problem,  an  intuitively  pleasing  decision  rule  is  the 
maximum  likelihood  rule,  which,  for  a  given  observation  Y  =  y,  picks  the  hypothesis  Hi  for  which 
the  observed  value  Y  =  y  most  likely;  that  is,  we  pick  i  so  as  to  maximize  the  conditional 
density  p{y\i)- 

Notation:  We  denote  by  “arg  max”  the  argument  of  the  maximum.  That  is,  if  the  maximum 
of  a  function  f{x)  occurs  at  Xq,  then  Xq  is  the  argument  of  the  maximum: 

max,,/(x)  =  /(xo),  arg  max^/(x)  =  Xq 


Note  also  that,  while  the  maximum  value  of  a  function  is  changed  if  we  apply  another  function 
to  it,  if  the  second  function  is  strictly  increasing,  then  the  argument  of  the  maximum  remains  the 
same.  For  example,  when  dealing  with  densities  taking  exponential  forms  (such  as  the  Gaussian), 
it  is  useful  to  apply  the  logarithm  (which  is  a  strictly  increasing  function),  as  we  note  for  the  ML 
rule  below. 

Maximum  Likelihood  (ML)  Decision  Rule:  The  ML  decision  rule  is  dehned  as 
^ML{y)  =  arg  maxo<i<^_i  p{y\i)  =  arg  maxo<i<^_i  \ogp{y\i)  (6.5) 


Another  decision  rule  that  “makes  sense”  is  the  Maximum  A  Posteriori  Probability  (MAP)  rule, 
where  we  pick  the  hypothesis  which  is  most  likely,  conditioned  on  the  value  of  the  observation. 
The  conditional  probabilities  P[Hi\Y  =  y]  are  called  the  a  posteriori,  or  posterior,  probabilities, 
since  they  are  probabilities  that  we  can  compute  after  we  see  the  observation  Y  =  y.  Let  us 
work  through  what  this  rule  is  actually  doing.  Using  Bayes’  rule,  the  posterior  probabilities  are 
given  by 


P[Hi\Y  =  y] 


p{y) 


T^ip{y\i) 

p{y) 


i  =  0,2.,,,.,M-  1 


Since  we  want  to  maximize  this  over  i,  the  denominator  p{y),  the  unconditional  density  of  Y, 
can  be  ignored  in  the  maximization.  We  can  also  take  the  log  as  we  did  for  the  ML  rule.  The 
MAP  rule  can  therefore  be  summarized  as  follows. 

Maximum  A  Posteriori  Probability  (MAP)  Rule:  The  MAP  decision  rule  is 
dehned  as 

SMAp{y)  =  arg  maxo<i<M-i  P[Hi\Y  =  y]  _  g. 

=  arg  maxi<i<jy^  T^iPiyli)  =  arg  maxo<i<M-i  logvTj  +  hgp{y\i)  ^  ' 

Properties  of  the  MAP  rule: 

•  The  MAP  rule  reduces  to  the  ML  rule  for  equal  priors. 
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•  The  MAP  rule  minimizes  the  probability  of  error.  In  other  words,  it  is  also  the  Minimum 
Probability  of  Error  (MPE)  rule. 

The  hrst  property  follows  from  (6.6)  by  setting  vr*  =  1/M:  in  this  case  vr*  does  not  depend  on  i 
and  can  therefore  be  dropped  when  maximizing  over  i.  The  second  property  is  important  enough 
to  restate  and  prove  as  a  theorem. 

Theorem  6.1.1  The  MAP  rule  (6.6)  minimizes  the  probability  of  error. 

Proof  of  Theorem  6.1.1:  We  show  that  the  MAP  rule  maximizes  the  probability  of  correct 
decision.  To  do  this,  consider  an  arbitrary  decision  rule  5,  with  corresponding  decision  regions 
{Pj}.  The  conditional  probabilities  of  correct  decision  are  given  by 

Pc\i  =  P\Y  eTi\Hi]=  (  p{y\i)dy,  i  =  0, 1, ...,  M  -  1 

JTi 

so  that  the  average  probability  of  correct  decision  is 


M-l  M-1  „ 

=  X]  T^iPc\i  piy\i)dy 

i=0  i=0 

Any  point  y  eT  can  belong  in  exactly  one  of  the  M  decision  regions.  If  we  decide  to  put  it  in 
Pj,  then  the  point  contributes  the  term  vrjp(?/|z)  to  the  integrand.  Since  we  wish  to  maximize 
the  overall  integral,  we  choose  to  put  y  in  the  decision  region  for  which  it  makes  the  largest 
contribution  to  the  integrand.  Thus,  we  put  it  in  P,  so  as  to  maximize  7rip{y\i),  which  is  precisely 
the  MAP  rule  (6.6).  □ 


Figure  6.1:  Hypothesis  testing  with  exponentially  distributed  observations. 


Example  6.1.1  (Hypothesis  testing  with  exponentially  distributed  observations):  A 

binary  hypothesis  problem  is  specihed  as  follows: 

Ho  :  Y  ^  Exp{l)  ,  Ml  :  T  ~  Mxp(l/4) 
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where  Exp{fi)  denotes  an  exponential  distribntion  with  density  ,  CDF  1  —  and  com¬ 
plementary  CDF  where  |/  >  0  (all  the  probability  mass  falls  on  the  nonnegative  nnmbers). 
Note  that  the  mean  of  an  Exp{p)  random  variable  is  1/p.  Thns,  in  onr  case,  the  mean  nnder  Hq 
is  1,  while  the  mean  nnder  Hi  is  4. 

(a)  Find  the  ML  rnle  and  the  corresponding  conditional  error  probabilities. 

(b)  Find  the  MPE  rnle  when  the  prior  probability  of  Hi  is  1/5.  Also  hnd  the  conditional  and 
average  error  probabilities. 

Solution: 

(a)  As  shown  in  Fignre  6.1,  we  have 

p(y\0)  =  e"^4>o  ,  p{y\l)  =  (l/4)e"^/^4>o 

The  ML  rnle  is  given  by 

Hi 

p(2/|i)  J  p{y\o) 

Ho 


which  rednces  to 

Hi 

(l/4)e-^/^  ^  e-^  (2/>0) 

Ho 

Taking  logarithms  on  both  sides  and  simplifying,  we  obtain  that  the  ML  rnle  is  given  by 

Hi 


y^  (4/3)  log  4  =  1.8484 
Ho 


The  conditional  error  probabilities  are 

4|o  =  /"[say  Hi\Ho]  =  P[Y  >  (4/3)  log4|iLo] 

^  g-(4/3)log4  ^  (1/4)4/3  ^  0.1575 


Pell  =  P[say  Ho\Hi]  =  P[Y  <  (4/3)  log4|Pi] 
=  1  -  e-d/3)iog4  ^  ^  ^  0.37 


These  conditional  error  probabilities  are  rather  high,  telling  ns  that  exponentially  distribnted 
observations  with  different  means  do  not  give  ns  high-qnality  information  abont  the  hypotheses, 
(b)  The  MPE  rnle  is  given  by 

Hi 

MP(|/|1)  J  7rop(|/|0) 

Ho 


which  rednces  to 


Hi 

(1/5)  (l/4)e-^/^  ^  (4/5)  e-^ 

Ho 
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This  gives 

Hi 

y  ^  -  log  16  =  3.6968 

y  <  3  6 

Ho 

Proceeding  as  in  (a),  we  obtain 

=  e-(P3)logl6  ^  (1/16)4/3  ^  q_q243 
=  1  -  e-(V3)iogi6  ^  ^  _  (1/16)1/3  ^  q_6031 

with  average  error  probability 

Pe  =  TToPeio  +  vriPe|i  =  (4/5)  *  0.0248  +  (1/5)  *  0.6031  =  0.1405 

Since  the  prior  probability  of  Hi  is  small,  the  MPE  rule  is  biased  towards  guessing  that  Hq  is 
true.  In  this  case,  the  decision  rule  is  so  skewed  that  the  conditional  probability  of  error  under 
Hi  is  actually  worse  than  a  random  guess.  Taking  this  one  step  further,  if  the  prior  probability 
of  Hi  actually  becomes  zero,  then  the  MPE  rule  would  always  guess  that  Hq  is  true.  In  this  case, 
the  conditional  probability  of  error  under  Hi  would  be  one!  This  shows  that  we  must  be  careful 
about  modeling  when  applying  the  MAP  rule:  if  we  are  wrong  about  our  prior  probabilities,  and 
Hi  does  occur  with  nonzero  probability,  then  our  performance  would  be  quite  poor. 


Both  the  ML  and  MAP  rules  involve  comparison  of  densities,  and  it  is  convenient  to  express 
them  in  terms  of  a  ratio  of  densities,  or  likelihood  ratio,  as  discussed  next. 


Binary  hypothesis  testing  and  the  likelihood  ratio:  For  binary  hypothesis  testing,  the  ML 
rule  (6.5)  reduces  to 

Hi  Hi 

pW)  /  P(!/|0)  .  or  ^  /  1  (6-7) 

Ho  Ho 


The  ratio  of  conditional  densities  appearing  above  is  defined  to  be  the  likelihood  ratio  (LR)  L{y) 
a  function  of  fundamental  importance  in  hypothesis  testing.  Formally,  we  define  the  likelihood 
ratio  as 


L{y) 


p{y\^) 
p{y\0)  ’ 


(6.8) 


Likelihood  ratio  test:  A  likelihood  ratio  test  (LRT)  is  a  decision  rule  in  which  we  compare  the 
likelihood  ratio  to  a  threshold. 

Hi 

L{y)  J  7 

Ho 

where  the  choice  of  7  depends  on  our  performance  criterion.  An  equivalent  form  is  the  log 
likelihood  ratio  test  (LLRT),  where  the  log  of  the  likelihood  ratio  is  compared  with  a  threshold. 

We  have  already  shown  in  (6.7)  that  the  ML  rule  is  an  LRT  with  threshold  7  =  1.  From  (6.6), 
we  see  that  the  MAP,  or  MPE,  rule  is  also  an  LRT: 


Hi 

MP(|/|1)  ^  7rop(|/|0)  , 

Ho 


Hi 

p{y\Hj  >  ^ 
p{y\0)  <  TTi 
Ho 
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This  is  important  enough  to  restate  formally. 

ML  and  MPE  rules  are  likelihood  ratio  tests. 


Ho 

Hi 

H 

> 

< 

—  or  log  L{y) 

TTl 

> 

< 

Ho 

H, 

Ho 


log—  MAP/MPEi 

TTl 


(6.9) 


(6.10) 


We  now  specialize  further  to  the  setting  of  Example  5.6.3.  The  conditional  densities  are  as  shown 
in  Figure  6.2.  Since  this  example  is  fundamental  to  our  understanding  of  signaling  in  AWGN, 
let  us  give  it  a  name,  the  basic  Gaussian  example,  and  summarize  the  set-up  in  the  language  of 
hypothesis  testing. 


Figure  6.2:  Conditional  densities  for  the  basic  Gaussian  example. 


Likelihood  ratio  for  basic  Gaussian  example:  Substituting  (6.11)  into  (6.8)  and  simplifying 
(this  is  left  as  an  exercise),  obtain  that  the  likelihood  ratio  for  the  basic  Gaussian  example  is 


L{v)  =  expU{mv-i-)) 

1  /  2\  (O-l^) 

logT(|/)  =  ^  i^my- 

ML  and  MAP  rules  for  basic  Gaussian  example:  Using  (6.12)  in  (6.9),  we  leave  it  as  an 
exercise  to  check  that  the  ML  rule  reduces  to 

Hi 

Y  ^  m/2,  ML  rule  (m  >  0)  (6.13) 

Ho 
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(check  that  the  inequalities  get  reversed  for  m  <  0).  This  is  exactly  the  “sensible”  rule  that  we 
analyzed  in  Example  5.6.3.  Using  (6.12)  in  (6.10),  we  obtain  the  MAP  rule: 


Hi 

2 

Y  ^  m/2  +  — log— ,  MAP  rule  (m  >  0) 
<  m  TTi 

Ho 


(6.14) 


Example  6.1.2  (ML  versus  MAP  for  the  basic  Gaussian  example):  For  the  basic  Gaus¬ 
sian  example,  we  now  know  that  the  decision  rule  in  Example  5.6.3  is  the  ML  rule,  and  we 
showed  in  that  example  that  the  performance  of  this  rule  is  given  by 

Fe|0  =  Pell  =  =  [vs nr/ 2^ 

We  also  saw  that  at  13  dB  SNR,  the  error  probability  for  the  ML  rule  is 

Pe,ML  =  7.8  X  10-" 


regardless  of  the  prior  probabilities.  For  equal  priors,  the  ML  rule  is  also  MPE,  and  we  cannot 
hope  to  do  better  than  this.  Let  us  now  see  what  happens  when  the  prior  probability  of  Hq  is 
ttq  =  |.  The  ML  rule  is  no  longer  MPE,  and  we  should  be  able  to  do  better  by  using  the  MAP 
rule.  We  leave  it  as  an  exercise  to  show  that  the  conditional  error  probabilities  for  the  MAP  rule 
are  given  by 


Pe\0  “  Q  (  7^ - 1 - loS - 

'  Zv  m  TTi 


,  m  V  TTo 

Pe\l  =  Q[7. - log  — 

'  Iv  m  TTi 


(6.15) 


Plugging  in  the  numbers  for  SNR  of  13  dB  and  tto  =  |,  we  obtain 


Pe|0  =  1.1  X  10"^  ,  Fell  =  5.34  X  10"^ 


which  averages  to 

Pe,MAP  =  7.3  X  10-" 

a  slight  improvement  on  the  error  probability  of  the  ML  rule. 

Figure  6.3  shows  the  results  of  further  numerical  experiments  (see  caption  for  discussion). 


6.1.3  Soft  Decisions 

We  have  so  far  considered  hard  decision  rules  in  which  we  must  choose  exactly  one  of  the  M 
hypotheses.  In  doing  so,  we  are  throwing  away  a  lot  of  information  in  the  observation.  For 
example,  suppose  that  we  are  testing  Hq  :  Y  A(0,4)  versus  iPi  :  U  ~  A(10,4)  with  equal 

Hi 

priors,  so  that  the  MPE  rule  is  U  ^  5.  We  would  guess  Hi  ii  Y  =  5.1  as  well  as  if  U  =  10.3, 

Ho 

but  we  would  be  a  lot  more  conhdent  about  our  guess  in  the  latter  instance.  Rather  than 
throwing  away  this  information,  we  can  employ  soft  decisions  that  convey  reliability  information 
which  could  be  used  at  a  higher  layer,  for  example,  by  a  decoder  which  is  processing  a  codeword 
consisting  of  many  bits. 

Actually,  we  already  know  how  to  compute  soft  decisions:  the  posterior  probabilities  P[Hi\Y  =  y], 
i  =  0,1, M  —  1,  that  appear  in  the  MAP  rule  are  actually  the  most  information  that  we  can 
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(a)  Dependence  on  SNR  (ttq  =  0.3) 


(b)  Dependence  on  priors  (SNR  =  10  dB) 


Figure  6.3:  Conditional  and  average  error  probabilities  for  the  MAP  receiver  compared  to  the 
error  probability  for  the  ML  receiver.  We  consider  the  basic  Gaussian  example,  hxing  the  priors 
and  varying  SNR  in  (a),  and  hxing  SNR  and  varying  the  priors  in  (b).  For  the  MAP  rule, 
the  conditional  error  probability  given  a  hypothesis  increases  as  the  prior  probability  of  the 
hypothesis  decreases.  The  average  error  probability  for  the  MAP  rule  is  always  smaller  than  the 
ML  rule  (which  is  the  MAP  rule  for  equal  priors)  when  ttq  7^  |.  The  MAP  error  probability 
tends  towards  zero  as  ttq  — )■  0  or  ttq  — )■  1. 


hope  to  get  about  the  hypotheses  from  the  observation.  For  notational  compactness,  let  us 
denote  these  by  7ri(2/).  The  posterior  probabilities  can  be  computed  using  Bayes’  rule  as  follows: 


=  P[H,\Y  =  y] 


T^iv{y\i) 

p{y) 


T^iv{y\i) 

T.f=-o'^Ay\3) 


(6.16) 


In  practice,  we  may  settle  for  quantized  soft  decisions  which  convey  less  information  than  the 
posterior  probabilities  due  to  tradeoffs  in  precision  or  complexity  versus  performance. 


Example  6.1.3  (Soft  decisions  for  4PAM  in  AWGN):  Consider  a  4-ary  hypothesis  testing 
problem  modeled  as  follows: 

Hq:Y  ^  N{-3A,  a^)  ,  Hi:Y  ^  N{-A,  a^)  ,  FTa  :  ~  N{A,  a^)  ,  H^:Y  ^  N{3A,  a^) 

This  is  a  model  that  arises  for  4PAM  signaling  in  AWGN,  as  we  see  later.  For  =  1,  A  =  1 
and  Y  =  —1.5,  hnd  the  posterior  probabilities  if  tiq  =  0.4  and  tti  =  7r2  =  tts  =  0.2. 

Solution:  The  posterior  probability  for  the  ith  hypothesis  is  of  the  form 

'Ki[y)  =  c  TTje 

where  m*  G  {±A,  ±3 A}  is  the  conditional  mean  under  ifj,  and  where  c  is  a  constant  that  does 
not  depend  on  i.  Since  the  posterior  probabilities  must  sum  to  one,  we  have 


Solving  for  c,  we  obtain 


E  '“-jt 

'^jiy)  =  c  2^-^je  =  1 

j=0  j=0 


T^i{y)  = 


7r,;e 


(y-mjp 


Z^J=0 
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Plugging  in  the  numbers,  we  obtain 

7ro(-1.5)  =  0.4121,  7ri(-1.5)  =  0.5600,  7r2(-1.5)  =  0.0279,  7r3(-1.5)  =  2.5  x  10"^ 

The  MPE  hard  decision  in  this  case  is  5mpe{—^-^)  =  1,  but  note  that  the  posterior  probability 
for  Hq  is  also  quite  high,  which  is  information  which  would  have  been  thrown  away  if  only 
hard  decisions  were  reported.  However,  if  the  noise  strength  is  reduced,  then  the  hard  decision 
becomes  more  reliable.  For  example,  for  =  0.1,  we  obtain 

7ro(-1.5)  =  9.08  X  10■^  7ri(-1.5)  =  0.9999,  7r2(-1.5)  =  9.36  x  10■^^  vr3(-1.5)  =  3.72  x  10"^^ 

where  it  is  not  wise  to  trust  some  of  the  smaller  numbers.  Thus,  we  can  be  quite  conhdent  about 
the  hard  decision  from  the  MPE  rule  in  this  case. 


For  binary  hypothesis  testing,  it  suffices  to  output  one  of  the  two  posterior  probabilities,  since 
they  sum  to  one.  However,  it  is  often  more  convenient  to  output  the  log  of  the  ratio  of  the 
posteriors,  termed  the  log  likelihood  ratio  (LLR): 


LLRiy) 


1  P\H^\Y-- 


(6.17) 


Notice  how  the  information  from  the  priors  and  the  information  from  the  observations,  each  of 
which  also  takes  the  form  of  an  LLR,  add  up  in  the  overall  LLR.  This  simple  additive  combining  of 
information  is  exploited  in  sophisticated  decoding  algorithms  in  which  information  from  one  part 
of  the  decoder  provides  priors  for  another  part  of  the  decoder.  Note  that  the  LLR  contribution 
due  to  the  priors  is  zero  for  equal  priors. 


Example  6.1.4  (LLRs  for  binary  antipodal  signaling):  Consider  i7i  :  R  ~  versus 

H,-.Y  A^(— H,  (T^).  We  shall  see  later  how  this  model  arises  for  binary  antipodal  signaling  in 
AWGN.  We  leave  it  as  an  exercise  to  show  that  the  LLR  is  given  by 


LLR(y)  = 


for  equal  priors. 


6.2  Signal  Space  Concepts 

We  have  seen  in  the  previous  section  that  the  statistical  relation  between  the  hypotheses  {Hi} 
and  the  observation  Y  are  expressed  in  terms  of  the  conditional  densities  p{y\i)-  We  are  now 
interested  in  applying  this  framework  for  derive  optimal  decision  rules  (and  the  receiver  structures 
required  to  implement  them)  for  the  problem  of  M-ary  signaling  in  AWGN.  In  the  language  of 
hypothesis  testing,  the  observation  here  is  the  received  signal  y{t)  modeled  as  follows: 

Hi\  y{t)  =  Siit) +n{t),  i  =  0, 1,  ...,M  -  1  (6.18) 

where  Si{t)  is  the  transmitted  signal  corresponding  to  hypothesis  Hi,  and  n{t)  is  WGN  with  PSD 
cr^  =  No/2.  Before  we  can  apply  the  framework  of  the  previous  section,  however,  we  must  hgure 
out  how  to  dehne  conditional  densities  when  the  observation  is  a  continuous-time  signal.  Here 
is  how  we  do  it: 

•  We  hrst  observe  that,  while  the  signals  Si{t)  live  in  an  inhnite-dimensional,  continuous-time 
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space,  if  we  are  only  interested  in  the  M  signals  that  could  be  transmitted  under  each  of  the  M 
hypotheses,  then  we  can  limit  attention  to  a  hnite-dimensional  subspace  of  dimension  at  most 
M .  We  call  this  the  signal  space.  We  can  then  express  the  signals  as  vectors  corresponding  to 
an  expansion  with  respect  to  an  orthonormal  basis  for  the  subspace. 

•  The  projection  of  WGN  onto  the  signal  space  gives  us  a  noise  vector  whose  components  are 
i.i.d.  Gaussian.  Furthermore,  we  observe  that  the  component  of  the  received  signal  orthogonal  to 
the  signal  space  is  irrelevant:  that  is,  we  can  throw  it  away  without  compromising  performance. 

•  We  can  therefore  restrict  attention  to  projection  of  the  received  signal  onto  the  signal  space 
without  loss  of  performance.  This  projection  can  be  expressed  as  a  hnite-dimensional  vector 
which  is  modeled  as  a  discrete  time  analogue  of  (6.18).  We  can  now  apply  the  hypothesis  testing 
framework  of  Section  6.1  to  infer  the  optimal  (ML  and  MPE)  decision  rules. 

•  We  then  translate  the  optimal  decision  rules  back  to  continuous  time  to  infer  the  structure  of 
the  optimal  receiver. 


6.2.1  Representing  signals  as  vectors 


Let  us  begin  with  an  example  illustrating  how  continuous-time  signals  can  be  represented  as 
hnite-dimensional  vectors  by  projecting  onto  the  signal  space. 

QPSK/4PSK/4QAM  8PSK  16QAM 


Figure  6.4:  For  linear  modulation  with  no  intersymbol  interference,  the  complex  symbols  them¬ 
selves  provide  a  two-dimensional  signal  space  representation.  Three  diherent  constellations  are 
shown  here. 


Example  6.2.1  (Signal  space  for  two-dimensional  modulation):  Gonsider  a  single  complex¬ 
valued  symbol  b  =  be  +  jbg  (assume  that  there  is  no  intersymbol  interference)  sent  using  two- 
dimensional  passband  linear  modulation.  The  set  of  possible  transmitted  signals  are  given  by 

^bebsit)  =  bcP(t)  cos  271  fet  -  bsp{t)  sin  271  fet 

where  {be,  bg)  takes  M  possible  values  for  an  M-ary  constellation  (e.g.,  M  =  A  for  QPSK,  M  =  16 
for  16QAM),  and  where  p{t)  is  a  baseband  pulse  of  bandwidth  smaller  than  the  carrier  frequency 
fe-  Setting  (j)e{t)  =  p{t)  cos27i fet  and  0s(t)  =  —p{t)  sm27i fet,  we  see  that  we  can  write  the  set  of 
transmitted  signals  as  a  linear  combination  of  these  signals  as  follows: 

^bc,bs{t')  ^c0c(^)  T  bs(j)s{t^ 

SO  that  the  signal  space  has  dimension  at  most  2.  From  Ghapter  2,  we  know  that  (pe  and  ps 
are  orthogonal  (I-Q  orthogonality),  and  hence  linearly  independent.  Thus,  the  signal  space  has 
dimension  exactly  2.  Noting  that  ||</>c|P  =  =  IIIpIPj  fhe  normalized  versions  of  pe  and  pg 

provide  an  orthonormal  basis  for  the  signal  space: 


We  can  now  write 


Sb,,bs{t)  =  -^\\p\\bcMt)  +  -^\\p\\bsiJs{t) 

With  respect  to  this  basis,  the  signals  can  be  represented  as  two  dimensional  vectors: 

Sbo,bs{t)  ^  Sb^,bs  ^  ^ 

That  is,  up  to  scaling,  the  signal  space  representation  for  the  transmitted  signals  are  simply  the 
two-dimensional  symbols  {be,  bg)'^-  Indeed,  while  we  have  been  careful  about  keeping  track  of 
the  scaling  factor  in  this  example,  we  shall  drop  it  henceforth,  because,  as  we  shall  soon  see, 
what  matters  in  performance  is  the  signal-to-noise  ratio,  rather  than  the  absolute  signal  or  noise 
strength. 


Orthogonal  modulation  provides  another  example  where  an  orthonormal  basis  for  the  signal 
space  is  immediately  obvious.  For  example,  if  si, ...,  sm  are  orthogonal  signals  with  equal  energy 
||s|p  =  Eg,  then  ^jJi{t)  =  provide  an  orthonormal  basis  for  the  signal  space,  and  the  vector 
representation  of  the  fth  signal  is  the  scaled  unit  vector  ^/E^{0, ...,  0, 1(  in  ith  position),  0, ...,  0)^. 

Yes  another  example  where  an  orthonormal  basis  can  be  determined  by  inspection  is  shown  in 
Figures  6.5  and  6.6,  and  discussed  in  Example  6.2.2. 
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S,(t) 

0  3  "  * 

-1 

1  3 
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5,(1) 

0  1  3  ' 

-1 

0  1 

2  3 

Figure  6.5:  Four  signals  spanning  a  three-dimensional  signal  space 


¥„(t)  ¥,(t)  ¥2(0 

1  -  1  -  1  - 

^01  t  ^012  ‘  2  3  ‘ 


Figure  6.6:  An  orthonormal  basis  for  the  signal  set  in  Figure  6.5,  obtained  by  inspection. 


Example  6.2.2  (Developing  a  signal  space  representation  for  a  4-ary  signal  set):  Con¬ 
sider  the  example  depicted  in  Figure  6.5,  where  there  are  4  possible  transmitted  signals,  Sq,  ...,  S3. 


289 


It  is  clear  from  inspection  that  these  span  a  three-dimensional  signal  space,  with  a  convenient 
choice  of  basis  signals 


V'o(^)  =  ^[0,1]  (i),  A{t)  =  Ill,2]{t),  -02  (t)  =  I[2,3]{t) 

as  shown  in  Figure  6.6.  Let  Sj  =  (sj[l],  denote  the  vector  representation  of  the  signal 

Si  with  respect  to  the  basis,  for  i  =  0, 1,  2,  3.  That  is,  the  coefficients  of  the  vector  Sj  are  such 
that 

2 

Si{i)  =  y^ji[k]'ijjk{t) 

k=0 

we  obtain,  again  by  inspection,  that 


Now  that  we  have  seen  some  examples,  it  is  time  to  be  more  precise  about  what  we  mean 
by  the  “signal  space.”  The  signal  space  S  is  the  hnite-dimensional  subspace  (of  dimension 
n  <  M)  spanned  by  so(t), ...,  SM-i(t).  That  is,  S  consists  of  all  signals  of  the  form  aoSo(^)  + 
...  -I-  aM-iSM-i{t),  where  ao,  are  arbitrary  scalars.  Let  'ipoit),  ...,'iljn-iit)  denote  an  or¬ 

thonormal  basis  for  S.  We  have  seen  in  the  preceding  examples  that  such  a  basis  can  often  be 
determined  by  inspection.  In  general,  however,  given  an  arbitrary  set  of  signals,  we  can  always 
construct  an  orthonormal  basis  using  the  Gram-Schmidt  procedure  described  below.  We  do  not 
need  to  use  this  procedure  often-in  most  settings  of  interest,  the  way  to  go  from  continuous  to 
discrete  time  is  clear-but  state  it  below  for  completeness. 


Vo(t) 


lld-oll 


Figure  6.7:  Illustrating  Step  0  and  Step  1  of  the  Gram-Schmidt  procedure. 


Gram-Schmidt  orthogonalization:  The  idea  is  to  build  up  an  orthonormal  basis  step  by 
step,  with  the  basis  after  the  mth  step  spanning  the  hrst  m  signals.  The  hrst  basis  function  is 
a  scaled  version  of  the  hrst  signal  (assuming  this  is  nonzero-otherwise  we  proceed  to  the  second 
signal  without  adding  a  basis  function).  We  then  consider  the  component  of  the  second  signal 
orthogonal  to  the  hrst  basis  function.  This  projection  is  nonzero  if  the  second  signal  is  linearly 
independent  of  the  hrst;  in  this  case,  we  introduce  a  basis  function  that  is  a  scaled  version  of 
the  projection.  See  Figure  6.7.  This  procedure  goes  on  until  we  have  covered  all  M  signals.  The 
number  of  basis  functions  n  equals  the  dimension  of  the  signal  space,  and  satishes  n  <  M.  We 
can  summarize  the  procedure  as  follows. 
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Letting  Sk-i  denote  the  subspace  spanned  by  Sq,  Sfc-i,  the  Gram-Schmidt  algorithm  proceeds 
iteratively:  given  an  orthonormal  basis  for  Sk-i,  it  hnds  an  orthonormal  basis  for  Sk-  The 
procedure  stops  when  k  =  M.  The  method  is  identical  to  that  used  for  hnite-dimensional 
vectors,  except  that  the  dehnition  of  the  inner  product  involves  an  integral,  rather  than  a  sum, 
for  the  continuous-time  signals  considered  here. 

Step  0  (Initialization):  Let  (j)o  =  Sq-  If  4>o  7^  0,  then  set  -00  =  \^\\-  Note  that  ipQ  provides  a 
basis  function  for  iSq. 

Step  k:  Suppose  that  we  have  constructed  an  orthonormal  basis  Bk-i  =  for  the 

subspace  Sk-i  spanned  by  the  hrst  k  signals,  sq,  (note  that  m  <  k).  Dehne 


m—1 

=  Sk{t)  -  y^^{sk,iJi)ipi{t) 
i=0 

The  signal  0fc(t)  is  the  component  of  Sk{t)  orthogonal  to  the  subspace  Sk-i-  If  7^  0,  dehne 
a  new  basis  function  update  the  basis  as  Bk  =  {"^i, ...,  V’m,  V’m}-  If  <Pk  =  0, 

then  SfcGiSfc-i,  and  it  is  not  necessary  to  update  the  basis;  in  this  case,  we  set  Bk  =  Bk-i  = 

{'00)  •••)  l}- 

The  procedure  terminates  at  step  M,  which  yields  a  basis  B  =  {-00)  •••)  t/'n-i}  for  the  signal  space 
S  =  Sm-i-  The  basis  is  not  unique,  and  may  depend  (and  typically  does  depend)  on  the  order  in 
which  we  go  through  the  signals  in  the  set.  We  use  the  Gram-Schmidt  procedure  here  mainly  as 
a  conceptual  tool,  in  assuring  us  that  there  is  indeed  a  hnite-dimensional  vector  representation 
for  a  hnite  set  of  continuous-time  signals. 


¥o(‘) 


¥i(t) 


V2(t) 
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1  2  3 


Figure  6.8:  An  orthonormal  basis  for  the  signal  set  in  Figure  6.5,  obtained  by  applying  the 
Gram-Schmidt  procedure.  The  unknowns  A,  B,  and  C  are  to  be  determined  in  Exercise  6.2.1. 


Exercise  6.2.1  (Application  of  the  Gram-Schmidt  procedure):  Apply  the  Gram-Schmidt 
procedure  to  the  signal  set  in  Figure  6.5.  When  the  signals  are  considered  in  increasing  order  of 
index  in  the  Gram-Schmidt  procedure,  verify  that  the  basis  signals  are  as  in  Figure  6.8,  and  hll 
in  the  missing  numbers.  While  the  basis  thus  obtained  is  not  as  “nice”  as  the  one  obtained  by 
inspection  in  Figure  6.6,  the  Gram-Schmidt  procedure  has  the  advantage  of  general  applicability. 

Inner  products  are  preserved:  We  shall  soon  see  that  the  performance  of  M-ary  signaling 
in  AWGN  depends  only  on  the  inner  products  between  the  signals,  if  the  noise  PSD  is  hxed. 
Thus,  an  important  observation  when  mapping  the  continuous  time  hypothesis  testing  problem 
to  discrete  time  is  to  check  that  these  inner  products  are  preserved  when  projecting  onto  the 
signal  space.  Gonsider  the  continuous  time  inner  products 

{si,Sj)  =  j  Si{t)sj{t)dt  ,  0  j  =  0, 1, ...,  M  -  1  (6.19) 
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Now,  expressing  the  signals  in  terms  of  their  basis  expansions,  we  have 

n—1 

Si{t)  =  ^Si[k]'ijjk{t)  ,  i  =  0, 1,  ...,M  -  1 

k=0 

Plugging  into  (6.19),  we  obtain 

/n—1  n—1 

k=0  1=0 


Interchanging  integral  and  summations,  we  obtain 


n—1  n—1  p 

Si[k]sj[l]  /  'ipk{t)'4)i{t)dt 
k=0  1=0  d 

By  the  orthonormality  of  the  basis  functions  {'0^},  we  have 

1,  k  =  I 


{i)k,'tpi)  =  I  'tpk{t)'tpi{t)dt  =  6ki  =  ^  ^  ^  ^  ^ 

This  collapses  the  two  summations  into  one,  so  that  we  obtain 

f.  n—1 


{si,Sj)  =  /  Si{t)sj{t)dt  =  ^Si[k]sj[k]  =  {si,Sj) 


k=0 


(6.20) 


where  the  extreme  right-hand  side  is  the  inner  product  of  the  signal  vectors  Sj  =  (sJO], ...,  sjn  — 
1])^  and  Sj  =  (sj[0],  ...,Sj[n  —  1])^.  This  makes  sense:  the  geometric  relationship  between  signals 
(which  is  what  the  inner  products  capture)  should  not  depend  on  the  basis  with  respect  to  which 
they  are  expressed. 


6.2.2  Modeling  WGN  in  signal  space 

What  happens  to  the  noise  when  we  project  onto  the  signal  space?  Dehne  the  noise  projection 
onto  the  ith  basis  function  as 


Ni  =  (n,  i/ji)  =  J  n{t)'i/ji{t)dt  ,  i  =  0, 1, ...,  n  -  1 
Then  we  can  write  the  noise  n{t)  as  follows: 

n—1 

n(t)  =  ^  Niipjt)  + 


(6.21) 


i=0 


where  n-^{t)  is  the  projection  of  the  noise  orthogonal  to  the  signal  space.  Thus,  we  can  decom¬ 
pose  the  noise  into  two  parts:  a  noise  vector  N  =  {Nq,  ...,  iV„_i)^  corresponding  to  the  projection 
onto  the  signal  space,  and  a  component  n-^{t)  orthogonal  to  the  signal  space.  In  order  to  charac¬ 
terize  the  statistics  of  these  quantities,  we  need  to  consider  random  variables  obtained  by  linear 
processing  of  WGN.  Specihcally,  consider  random  variables  generated  by  passing  WGN  through 
correlators: 


= 


Zo  = 


'  —oo 
roo 


n(t)ui(t)dt  =  (n.  Ml) 

n(t)u2(t)dt  =  (n,  M2) 


where  mi  and  M2  are  deterministic,  hnite  energy  signals.  We  can  now  state  the  following  result. 
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Theorem  6.2.1  (WGN  through  correlators):  The  random  variables  Z\  =  {n^ui)  and  = 
{n,U2)  are  zero  mean,  jointly  Gaussian,  with 

cov(Zi,  Z2)  =  cov  ((n,  Ml),  (n,  M2))  =  o-^(mi,M2) 

Specializing  to  mi  =  M2  =  u,  we  obtain  that 

var((?7,,  m))  =  cov((?7,,  m),  {n,  u))  =  cr^|  |m| 

Thus,  we  obtain  that  Z  =  (Zi,  ^2)^  ~  -^(0,  C)  with  covariance  matrix 

(  ct^IImiIP  a‘^{ui,U2)  \ 

^-[a\u^,U2)  cr^lKIP  J 

Proof  of  Theorem  6.2.1:  The  random  variables  Zi  =  (n,  mi)  and  Z2  =  (n,  M2)  are  zero  mean 
and  jointly  Gaussian,  since  n  is  zero  mean  and  Gaussian.  Their  covariance  is  computed  as 

cov  ((n.  Ml),  (n,  M2))  =  E  [(n,  Mi)(n,  M2)]  =  E  [J  n(t)ui(t)dt  J  n{s)u2{s)ds] 

=  f  f  Ui(t)u2(s)Eln(t)n(s)]dt  ds  =  f  f  Ui(t)u2(s)a^d(t  —  s)dt  ds 
=  cr^  f  Ui(t)u2(t)dt  =  (T^(mi,  M2) 

The  preceding  computation  is  entirely  analogous  to  the  ones  we  did  in  Example  5.8.2  and  in 
Section  5.10,  but  it  is  important  enough  that  we  repeat  some  points  that  we  had  mentioned 
then.  First,  we  need  to  use  two  different  variables  of  integration,  t  and  s,  in  order  to  make  sure 
we  capture  all  the  cross  terms.  Second,  when  we  take  the  expectation  inside  the  integrals,  we 
must  group  all  random  terms  inside  it.  Third,  the  two  integrals  collapse  into  one  because  the 
autocorrelation  function  of  WGN  is  impulsive.  Finally,  specializing  the  covariance  to  get  the 
variance  leads  to  the  remaining  results  stated  in  the  theorem.  □ 

We  can  now  provide  the  following  geometric  interpretation  of  WGN. 

Remark  6.2.1  (Geometric  interpretation  of  WGN):  Theorem  6.2.1  implies  that  the  pro¬ 
jection  of  WGN  along  any  “direction”  in  the  space  of  signals  (i.e.,  the  result  of  correlating  WGN 
with  a  unit  energy  signal)  has  variance  =  Nq/2.  Also,  its  projections  in  orthogonal  directions 
are  jointly  Gaussian  and  uncorrelated  random  variables,  and  are  therefore  independent. 

Noise  projection  on  the  signal  space  is  discrete  time  WGN:  It  follows  from  the  preceding 
remark  that  the  noise  projections  W  =  along  the  orthonormal  basis  functions  {'0*}  for 

the  signal  space  are  i.i.d.  A^(0,cr^)  random  variables.  In  other  words,  the  noise  vector  N  = 
(7Vo,...,W-i)^  N  (0,  (T^I).  In  other  word,  the  components  of  N  constitute  discrete  time  white 
Gaussian  noise  (“white”  in  this  case  means  uncorrelated  and  having  equal  variance  across  all 
components). 


6.2.3  Hypothesis  testing  in  signal  space 

Now  that  we  have  the  signal  and  noise  models,  we  can  put  them  together  in  our  hypothesis 
testing  framework.  Let  us  condition  on  hypothesis  Hi.  The  received  signal  is  given  by 

y{t)  =  Si{t)  +  n{t)  (6.22) 

Projecting  this  onto  the  signal  space  by  correlating  against  the  orthonormal  basis  functions,  we 
get 

Y[k]  =  {y,  i/jk)  =  {si  +  n,  ipk)  =  Si[k]  -h  N[k]  ,  /c  =  0, 1., , ,  .n  -  1 
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n  (t)  (infinite-dimensional  waveform) 


Figure  6.9:  Illustration  of  signal  space  concepts.  The  noise  projection  n-^{t)  orthogonal  to  the 
signal  space  is  irrelevant.  The  relevant  part  of  the  received  signal  is  the  projection  onto  the  signal 
space,  which  equals  the  vector  Y  =  Sj  +  N  under  hypothesis  Hi. 


Collecting  these  into  an  n-dimensional  vector,  we  get  the  model 

if,  :  Y  =  s,  +  N 

Note  that  the  vector  Y  =  ...,y[n]Y  completely  describes  the  component  of  the  received 

signal  y{t)  in  the  signal  space,  given  by 


n—1  n—1 

ys{t)  = 

j=0  j=0 

The  component  of  the  received  signal  orthogonal  to  the  signal  space  is  given  by 

=  y{t)-ys{t) 

It  is  shown  in  Appendix  6. A  that  this  component  is  irrelevant  to  our  decision.  There  are  two 
reasons  for  this,  as  elaborated  in  the  appendix:  hrst,  there  is  no  signal  contribution  orthogonal 
to  the  signal  space  (by  dehnition);  second,  for  the  WGN  model,  the  noise  component  orthogonal 
to  the  signal  space  carries  no  information  regarding  the  noise  vector  in  the  signal  space.  As  illus¬ 
trated  in  Figure  6.9,  this  enables  us  to  reduce  our  inhnite-dimensional  problem  to  the  following 
hnite-dimensional  vector  model,  without  loss  of  optimality. 

Model  for  received  vector  in  signal  space 

iii:Y  =  Si  +  N,  i  =  0,l,...,M-l  (6.23) 

where  N  ~  ^(O,^^!). 

Two-dimensional  modnlation  (Example  6.2.1  revisited):  For  a  single  symbol  sent  using 
two-dimensional  modulation,  we  have  the  hypotheses 

Hh,,h,  :  y{t)  =  Sf,,,6,(i)  +n{t) 


where 


Sb,,hs{t)  =  hcpit)  cos 2ti fct  -  hspit)  sin 2'Kfct 
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Figure  6.10:  A  signal  space  view  of  QPSK.  In  the  scenario  shown,  Sq  is  the  transmitted  vector, 
and  Y  =  Sq  +  N  is  the  received  vector  after  noise  is  added.  The  noise  components  N^.,  Ng  are 
i.i.d.  Y(0,  (j^)  random  variables. 


Restricting  attention  to  the  two-dimensional  signal  space  identihed  in  the  example,  we  obtain 
the  model 


where  we  have  absorbed  scale  factors  into  the  symbol  {be,  bg),  and  where  the  I  and  Q  noise  compo¬ 
nents  Nc,  Ng  are  i.i.d.  Y(0,cr^).  This  is  illustrated  for  QPSK  in  Figure  6.10.  Thus,  conditioned 
on  Ye  N{bc,<j'^)  and  Yg  ~  N{bg,a‘^),  and  Yc,Ys  are  conditionally  independent.  The 

conditional  density  of  Y  =  {Yc,  Yg)'^  conditioned  on  is  therefore  given  by 


p{yc,ys\bc,bg) 


^  p-(yc-bcfl(2a^) 

2a^ 


^  p-(ys-bsf 
2<j^ 


We  can  now  infer  the  ML  and  MPE  rules  using  our  hypothesis  testing  framework.  However,  since 
the  same  reasoning  applies  to  signal  spaces  of  arbitrary  dimensions,  we  provide  a  more  general 
discussion  in  the  next  section,  and  then  return  to  examples  of  two-dimensional  modulation. 


6.2.4  Optimal  Reception  in  AWGN 

We  begin  by  characterizing  the  optimal  receiver  when  the  received  signal  is  a  hnite-dimensional 
vector.  Using  this,  we  infer  the  optimal  receiver  for  continuous-time  received  signals. 

Demodulation  for  M-ary  signaling  in  discrete  time  AWGN  corresponds  to  solving  an  M-ary 
hypothesis  testing  problem  with  observation  model  as  follows: 

:Y  =  Si  +  N  z  =  0, 1,...,M- 1  (6.24) 

where  N  ~  Y(0,  cr^I)  is  discrete  time  WGN.  The  ML  and  MPE  rules  for  this  problem  are  given 
as  follows.  As  usual,  we  denote  the  prior  probabilities  required  to  specify  the  MPE  rule  by 
{vTi,  Z  =  1,  ..,  M}  (E*=0  ^  TTi  =  1). 
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Optimal  Demodulation  for  Signaling  in  Discrete  Time  AWGN 


ML  rule 


MPE  rule 


Sudy)  =  arg  mino<i<^_i  ||y  - 

=  arg  (y,  s^)  -  ^ 


^MPsiy)  =  arg  mino<^<j\^_i  ||y  -  s^lp  -  2a‘^\og7Ti 

=  arg  maxo<i<^_i  (y,  Sj)  -  logyr* 


(6.25) 


(6.26) 


Interpretation  of  optimal  decision  rnles:  The  ML  rule  can  be  interpreted  in  two  ways. 
The  hrst  is  as  a  minimum  distance  rule,  choosing  the  transmitted  signal  which  has  minimum 
Euclidean  distance  to  the  noisy  received  signal.  The  second  is  as  a  “template  matcher” :  choosing 
the  transmitted  signal  with  highest  correlation  with  the  noisy  received  signal,  while  adjusting 
for  the  fact  that  the  energies  of  different  transmitted  signals  may  be  different.  The  MPE  rule 
adjusts  the  ML  cost  function  to  reflect  prior  information:  the  adjustment  term  depends  on  the 
noise  level  and  the  prior  probabilities.  The  MPE  cost  functions  decompose  neatly  into  a  sum  of 
the  ML  cost  function  (which  depends  on  the  observation)  and  a  term  reflecting  prior  knowledge 
(which  depends  on  the  prior  probabilities  and  the  noise  level).  The  latter  term  scales  with  the 
noise  variance  Thus,  we  rely  more  on  the  observation  at  high  SNR  (small  a),  and  more  on 
prior  knowledge  at  low  SNR  (large  a). 

Derivation  of  optimal  receiver  structures  (6.25)  and  (6.26):  Under  hypothesis  Hi,  Y  is 
a  Gaussian  random  vector  with  mean  Sj  and  covariance  matrix  (the  translation  of  the  noise 
vector  N  by  the  deterministic  signal  vector  s,  does  not  change  the  covariance  matrix),  so  that 


PY\i{y\Hi) 


1 

(27rcr^)"'/^ 


(6.27) 


Plugging  (6.27)  into  the  ML  rule  (6.5,  we  obtain  the  rule  (6.25)  upon  simplihcation.  Similarly, 
we  obtain  (6.26)  by  substituting  (6.27)  in  the  MPE  rule  (6.6).  □ 


We  now  map  the  optimal  decision  rules  in  discrete  time  back  to  continuous  time  to  obtain  optimal 
detectors  for  the  original  continuous-time  model  (6.18),  as  follows. 


Optimal  Demodulation  for  Signaling  in  Continnous  Time  AWGN 

ML  rule 

II  IP 

SuLiv)  -  arg  maxo<i<A^_i  {y,  Si)  ^ 

(6.28) 

MPE  rule 

II  IP 

SMPEiy)  -  arg  maxo<i<A^_i  {y,  Si)  ^  log  tt^ 

(6.29) 

Derivation  of  optimal  receiver  structures  (6.28)  and  (6.29):  Due  to  the  irrelevance  of  y-^, 
the  continuous  time  model  (6.18)  reduces  to  the  discrete  time  model  (6.24)  by  projecting  onto 
the  signal  space.  It  remains  to  map  the  optimal  decision  rules  (6.25)  and  (6.26)  for  discrete  time 
observations,  back  to  continuous  time.  These  rules  involve  correlation  between  the  received  and 
transmitted  signals,  and  the  transmitted  signal  energies.  It  suffices  to  show  that  these  quantities 
are  the  same  for  both  the  continuous  time  model  and  the  equivalent  discrete  time  model.  We 
know  now  that  signal  inner  products  are  preserved,  so  that 
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Further,  the  continuous-time  correlator  output  can  be  written  as 

(2/,  Si)  =  {ys  +  Si)  =  {ys,  Si)  -h  {y^,  Si) 

=  {ys,  Si)  =  (y,  Si) 

where  the  last  equality  follows  because  the  inner  product  between  the  signals  ys  and  Si  (which 
both  he  in  the  signal  space)  is  the  same  as  the  inner  product  between  their  vector  representations. 

□ 

Why  don’t  we  have  a  “minimum  distance”  rule  in  continuous  time?  Notice  that  the 
optimal  decision  rules  for  the  continuous  time  model  do  not  contain  the  continuous  time  version 
of  the  minimum  distance  rule  for  discrete  time.  This  is  because  of  a  technical  subtlety.  In 
continuous  time,  the  squares  of  the  distances  would  be 

||l/-s.|p  =  |||/5-s.|r  +  ||l/^ir  =  ||l/5-5.|r  +  ||n^ir 

Under  the  AWGN  model,  the  noise  power  orthogonal  to  the  signal  space  is  inhnite,  hence  from 
a  purely  mathematical  point  of  view,  the  preceding  quantities  are  inhnite  for  each  i  (so  that  we 
cannot  minimize  over  i).  Hence,  it  only  makes  sense  to  talk  about  the  minimum  distance  rule 
in  a  hnite-dimensional  space  in  which  the  noise  power  is  hnite.  The  correlator  based  form  of 
the  optimal  detector,  on  the  other  hand,  automatically  achieves  the  projection  onto  the  hnite- 
dimensional  signal  space,  and  hence  does  not  suher  from  this  technical  difficulty.  Of  course,  in 
practice,  even  the  continuous  time  received  signal  may  be  limited  to  a  hnite-dimensional  space  by 
hltering  and  time-limiting,  but  correlator-based  detection  still  has  the  practical  advantage  that 
only  components  of  the  received  signal  which  are  truly  useful  appear  in  the  decision  statistics. 


y(t) 


Choose 

the 

max 


Decision 


s  (t) 
M-1 


‘^M-l 


Figure  6.11:  The  optimal  receiver  for  an  AWGN  channel  can  be  implemented  using  a  bank  of 
correlators.  For  the  ML  rule,  the  constants  =  ||sj|p/2;  for  the  MPE  rule,  a*  =  ||si|p/2  — 
logvTi. 


Bank  of  Correlators  or  Matched  Filters:  The  optimal  receiver  involves  computation  of  the 
decision  statistics 


{y,  Si) 


y{t)si{t)dt 
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y(t) 


Decision 


‘M-1 


Figure  6.12:  An  alternative  implementation  for  the  optimal  receiver  using  a  bank  of  matched 
hlters.  For  the  ML  rule,  the  constants  Oj  =  ||sj|p/2;  for  the  MPE  rule,  Oj  =  ||sj|p/2  —  cr^logvTj. 


and  can  therefore  be  implemented  using  a  bank  of  correlators,  as  shown  in  Figure  6.11.  Of 
course,  any  correlation  operation  can  also  be  implemented  using  a  matched  hlter,  sampled  at  the 
appropriate  time.  Dehning  Si^mfit)  =  Si{—t)  as  the  impulse  response  of  the  hlter  matched  to  Sj, 
we  have 

{y,  Si)  =  j  y{t)si{t)dt  =  j  y{t)si,mfi-t)dt  =  {y  *  Si^mf)  (0) 

Figure  6.12  shows  an  alternative  implementation  for  the  optimal  receiver  using  a  bank  of  matched 
hlters. 


Implementation  in  complex  baseband:  We  have  developed  the  optimal  receiver  structures 
for  real- valued  signals,  so  that  these  apply  to  physical  baseband  and  passband  signals.  However, 
recall  from  Chapter  2  that  correlation  and  hltering  in  passband,  which  is  what  the  optimal  receiver 
does,  can  be  implemented  in  complex  baseband  after  downconversion.  In  particular,  for  passband 
signals  Up(t)  =  Uc(t)  cos 27ifct  —  Us(t)  sm27rff.t  and  Vp(t)  =  vdt)  cos 2% f^t  —  Vs(t)  sin 27ifct,  the 
inner  product  can  be  written  as 


{up,  Vp)  =  -  ((Uc,  Vc)  +  {us,  Vs))  =  -Re{u,  v) 


(6.30) 


where  u  =  Uc  +  jUg  and  v  =  Vc  +  jVg  are  the  corresponding  complex  envelopes.  Figure  6.13  shows 
how  a  passband  correlation  can  be  implemented  in  complex  baseband.  Note  that  we  correlate  the 
I  component  with  the  I  component,  and  the  Q  component  with  the  Q  component.  This  is  because 
our  optimal  receiver  is  based  on  the  assumption  of  coherent  reception:  our  model  assumes  that 
the  receiver  has  exact  copies  of  the  noiseless  transmitted  signals.  Thus,  ideal  carrier  synchronism 
is  implicitly  assumed  in  this  model,  so  that  the  I  and  Q  components  do  not  get  mixed  up  as  they 
would  if  the  receiver’s  LO  were  not  synchronized  to  the  incoming  carrier. 


6.2.5  Geometry  of  the  ML  decision  rule 

The  minimum  distance  interpretation  for  the  ML  decision  rule  implies  that  the  decision  regions 
(in  signal  space)  for  M-ary  signaling  in  AWGN  are  constructed  as  follows.  Interpret  the  signal 
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Figure  6.13:  The  passband  correlations  required  by  the  optimal  receiver  can  be  implemented  in 
complex  baseband.  Since  the  I  and  Q  components  are  lowpass  waveforms,  correlation  with  them 
is  an  implicit  form  of  lowpass  hltering.  Thus,  the  LPFs  after  the  mixers  could  potentially  be 
eliminated,  which  is  why  they  are  shown  within  dashed  boxes. 


ML  decision  boundary 
is  an  (n-1)  dimensional  hyperplane 


Figure  6.14:  The  ML  decision  boundary  when  testing  between  Sj  and  Sj  is  the  perpendicular 
bisector  of  the  line  joining  the  signal  points,  which  is  an  [n  —  l)-dimensional  hyperplane  for  an 
n-dimensional  signal  space. 
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vectors  {s*},  and  the  received  vector  y,  as  points  in  n-dimensional  Euclidean  space.  When 
deciding  between  any  pair  of  signals  Sj  and  Sj  (which  are  points  in  n-diniensional  space),  we 
draw  a  line  between  these  points.  The  decision  boundary  is  the  the  perpendicular  bisector  of  this 
line,  which  is  an  (n  — l)-dimensional  hyperplane.  This  is  illustrated  in  Figure  6.14,  where,  because 
we  are  constrained  to  draw  on  two-dimensional  paper,  the  hyperplane  reduces  to  a  line.  But  we 
can  visualize  a  plane  containing  the  decision  boundary  coming  out  of  the  paper  for  a  three- 
dimensional  signal  space.  While  it  is  hard  to  visualize  signal  spaces  of  more  than  3  dimensions, 
the  computation  for  deciding  which  side  of  the  ML  decision  boundary  the  received  vector  y  lies 
on  is  straightforward:  simply  compare  the  Euclidean  distances  ||y  —  Sj||  and  ||y  —  Sj||. 


Figure  6.15:  ML  decision  region  Ti  for  signal  si. 


Figure  6.16:  ML  decision  regions  for  some  two-dimensional  constellations. 


The  ML  decision  regions  are  constructed  from  drawing  these  pairwise  decision  regions.  For  any 
given  i,  draw  a  line  between  s*  and  Sj  for  all  j  ^  i.  The  perpendicular  bisector  of  the  line  between 
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Sj  and  Sj  defines  two  half-spaces  (half-planes  for  n  =  2),  one  in  which  we  choose  s*  over  Sj,  the 
other  in  which  we  choose  over  s*.  The  intersection  of  the  half-spaces  in  which  Sj  is  chosen  over 
Sj,  for  j  ^  i,  dehnes  the  decision  region  Tj.  This  procedure  is  illustrated  for  a  two-dimensional 
signal  space  in  Figure  6.15.  The  line  Lu  is  the  perpendicular  bisector  of  the  line  between  Si  and 
Sj.  The  intersection  of  these  lines  dehnes  Ti  as  shown.  Note  that  Lig  plays  no  role  in  determining 
Ti,  since  signal  Sg  is  “too  far”  from  Si,  in  the  following  sense:  if  the  received  signal  is  closer  to  Sg 
than  to  Si,  then  it  is  also  closer  to  Sj  than  to  Si  for  some  i  =  2,3, 4,  5.  This  kind  of  observation 
plays  an  important  role  in  the  performance  analysis  of  ML  reception  in  Section  6.3. 

The  preceding  procedure  can  now  be  applied  to  the  simpler  scenario  of  two-dimensional  constel¬ 
lations  to  obtain  ML  decision  regions  as  shown  in  Figure  6.16.  For  QPSK,  the  ML  regions  are 
simply  the  four  quadrants.  For  8PSK,  the  ML  regions  are  sectors  of  a  circle.  For  16QAM,  the 
ML  regions  take  a  rectangular  form. 


6.3  Performance  Analysis  of  ML  Reception 

We  focus  on  performance  analysis  for  the  ML  decision  rule,  assuming  equal  priors  (for  which  the 
ML  rule  minimizes  the  error  probability).  The  analysis  for  MPE  reception  with  unequal  priors 
is  skipped,  but  it  is  a  simple  extension.  We  begin  with  a  geometric  picture  of  how  errors  are 
caused  by  WGN. 


6.3.1  The  Geometry  of  Errors 


Decision 

boundary 


Figure  6.17:  Only  the  component  of  noise  perpendicular  to  the  decision  boundary,  Np^rp,  can 
cause  the  received  vector  to  cross  the  decision  boundary,  starting  from  the  signal  point  s. 


In  Figure  6.17,  suppose  that  signal  s  is  sent,  and  we  wish  to  compute  the  probability  that  the 
noise  vector  N  causes  the  received  vector  to  cross  a  given  decision  boundary.  From  the  hgure, 
it  is  clear  that  an  error  occurs  when  Np^rp,  the  projection  of  the  noise  vector  perpendicular  to 
the  decision  boundary,  is  what  determines  whether  or  not  we  will  cross  the  boundary.  It  does 
not  matter  what  happens  with  the  component  parallel  to  the  boundary.  While  we  draw 
the  picture  in  two  dimensions,the  same  conclusion  holds  in  general  for  an  n-dimensional  signal 
space,  where  s  and  N  have  dimension  n,  Np^^  has  dimension  n  —  1,  while  Nperp  is  still  a  scalar. 
Since  Aper-p  ~  A(0,  (the  projection  of  WGN  in  any  direction  has  this  distribution),  we  have 

F[cross  a  boundary  at  distance  D]  =  P[Nperp  >  D]  =  Q  f — j  (6.31) 
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Figure  6.18:  When  making  an  ML  decision  between  Sq  and  Si,  the  decision  boundary  is  at 
distance  D  =  d/2  from  each  signal  point,  where  d  =  ||si  —  So||  is  the  Euclidean  distance  between 
the  two  points. 


Now,  let  us  apply  the  same  reasoning  to  the  decision  boundary  corresponding  to  making  an 
ML  decision  between  two  signals  Sq  and  Si,  as  shown  in  Figure  6.18.  Suppose  that  Sq  is  sent. 
What  is  the  probability  that  the  noise  vector  N,  when  added  to  it,  sends  the  received  vector  into 
the  wrong  region  by  crossing  the  decision  boundary?  We  know  from  (6.31)  that  the  answer  is 
Q{D/a),  where  D  is  the  distance  between  Sq  and  the  decision  boundary.  For  ML  reception,  the 
decision  boundary  is  the  plane  that  is  the  perpendicular  bisector  of  the  line  between  Sq  and  Si, 
whose  length  equals  d  =  ||si  —  So||,  the  Euclidean  distance  between  the  two  signal  vectors.  Thus, 
D  =  d/2  =  ||si  —  Soll/2.  Thus,  the  probability  of  crossing  the  ML  decision  boundary  between 
the  two  signal  vectors  (starting  from  either  of  the  two  signal  points)  is 

(6.32) 


F[cross  ML  boundary  between  Sq  and  Si]  =  Q 


|Sl  —  Sol 


=  Q 


Si  - 


2a 


where  we  note  that  the  Euclidean  distance  between  the  signal  vectors  and  the  corresponding 
continuous  time  signals  is  the  same. 

Notation:  Now  that  we  have  established  the  equivalence  between  working  with  continuous  time 
signals  and  the  vectors  that  represent  their  projections  onto  signal  space,  we  no  longer  need  to 
be  careful  about  distinguishing  between  them.  Accordingly,  we  drop  the  use  of  boldface  notation 
henceforth,  using  the  notation  y,  Si  and  n  to  denote  the  received  signal,  the  transmitted  signal, 
and  the  noise,  respectively,  in  both  settings. 


6.3.2  Performance  with  binary  signaling 

Consider  binary  signaling  in  AWGN,  where  the  received  signal  is  modeled  using  two  hypotheses 
as  follows: 

Hi  :  y{t)  =  si{t)  +  n{t)  .  . 

Hii-.y{t)=sii{t)+n{t) 

Geometric  computation  of  error  probability:  The  ML  decision  boundary  for  this  problem 
is  as  in  Figure  6.18.  The  conditional  error  probability  is  simply  the  probability  that,  starting  from 
one  of  the  signal  points,  the  noise  makes  us  cross  the  boundary  to  the  wrong  side,  the  probability 
of  which  we  have  already  computed  in  (6.32).  Since  the  conditional  error  probabilities  are  equal, 
they  also  equal  the  average  error  probability  regardless  of  the  priors.  We  therefore  obtain  the 
following  expression. 
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Error  probability  for  binary  signaling  with  ML  reception 

Pe,ML  =  Pell  =  Pe|0  =  Q  =  Q  (^)  (6-34) 

where  d  =  ||si  —  So||  is  the  distance  between  the  two  possible  received  signals. 

Algebraic  computation:  While  this  geometric  computation  is  intuitively  pleasing,  it  is  impor¬ 
tant  to  also  master  algebraic  approaches  to  computing  the  probabilities  of  errors  due  to  WGN. 
It  is  easiest  to  hrst  consider  on-off  keying. 


Hi  :  y{t)  =  s{t)  n{t) 
Ho  :  y{t)  =  n{t) 

Applying  (6.28),  we  hnd  that  the  ML  rule  reduces  to 


Hi 

(y^s)  J 

Ho 


(6.35) 


(6.36) 


Setting  Z  =  {y,s),  we  wish  to  compute  the  conditional  error  probabilities  given  by 

1 1  1 1 2  1 1  1 1 2 

Fell  =  P[Z  <  ^|Pi]  Pe\o  =  P[Z  >  ^|Po]  (6.37) 

We  have  actually  already  done  these  computations  in  Example  5.8.2,  but  it  pays  to  review  them 
quickly.  Note  that,  conditioned  on  either  hypothesis,  Z  is  a  Gaussian  random  variable.  The 
conditional  mean  and  variance  of  Z  under  Ho  are  given  by 

E[Z|i7o]  =  E[(n,s)]  =  0 

var(Z|i/o)  =  cov((?7,,  s),  (n,  s))  =  cr^|  |s|  p 

where  we  have  used  Theorem  6.2.1,  and  the  fact  that  n{t)  has  zero  mean.  The  corresponding 
computation  under  iLi  is  as  follows: 

E[Z|i7i]  =  E[(s  +  n,s)]  =  ||s||2 

var(Z|i7i)  =  cov((s  -f-  n,  s) ,  (s  -|-  n,  s))cov((n,  s),  (n,  s))  =  |s|  p 

noting  that  covariances  do  not  change  upon  adding  constants.  Thus,  Z  ~  iV(0,n^)  under  Ho  and 
Z  ~  N{m,v'^)  under  Hi,  where  m  =  ||s|p  and  =  (T^||s|p.  Substituting  in  (6.37),  it  is  easy  to 
check  that 

Pell  =  Pe|0  =  Q  (6.38) 

Going  back  to  the  more  general  binary  signaling  problem  (6.33),  the  ML  rule  is  given  by  (6.28) 
to  be 

Hi 

{y,  si)  -  ^  {y,  So)  - 

Ho 
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We  can  analyze  this  system  by  considering  the  joint  distribution  of  the  correlator  statistics  (?/,  si) 
and  {y,so),  which  are  jointly  Gaussian  conditioned  on  each  hypothesis.  However,  it  is  simpler 
and  more  illuminating  to  rewrite  the  ML  decision  rule  as 


Hi 

{y,si  -  So)  ^ 

Ho 


2 


2 


This  is  consistent  with  the  geometry  depicted  in  Figure  6.18:  only  the  projection  of  the  received 
signal  along  the  line  joining  the  signals  matters  in  the  decision,  and  hence  only  the  noise  along 
this  direction  can  produce  errors.  The  analysis  now  involves  the  conditional  distributions  of  the 
single  decision  statistic  Z  =  {y,  si  —  sq),  which  is  conditionally  Gaussian  under  either  hypothesis. 
The  computation  of  the  conditional  error  probabilties  is  left  as  an  exercise,  but  we  already  know 
that  the  answer  should  work  out  to  (6.34). 

A  quicker  approach  is  to  consider  a  transformed  system  with  received  signal  y(t)  =  y(t)  —  so{t). 
Since  this  transformation  is  invertible,  the  performance  of  an  optimal  rule  is  unchanged  under 
it.  But  the  transformed  received  signal  y{t)  falls  under  the  on-off  signaling  model  (6.35),  with 
s{t)  =  Si{t)  —  so{t).  The  ML  error  probability  formula  (6.34)  therefore  follows  from  the  formula 
(6.38). 

Scale  Invariance:  The  formula  (6.34)  illustrates  that  the  performance  of  the  ML  rule  is  scale- 
invariant:  if  we  scale  the  signals  and  noise  by  the  same  factor  a,  the  performance  does  not 
change,  since  both  ||si  —  so||  and  a  scale  by  a.  Thus,  the  performance  is  determined  by  the  ratio 
of  signal  and  noise  strengths,  rather  than  individually  on  the  signal  and  noise  strengths.  We  now 
define  some  standard  measures  for  these  quantities,  and  then  express  the  performance  of  some 
common  binary  signaling  schemes  in  terms  of  them. 

Energy  per  bit,  Ef,:  For  binary  signaling,  this  is  given  by 

Eb  =  ^(ikoip + iisiin 


assuming  that  0  and  1  are  equally  likely  to  be  sent. 


Scale-invariant  parameters:  If  we  scale  up  both  si  and  sq  by  a  factor  A,  Eb  scales  up  by  a 
factor  while  the  distance  d  scales  up  by  a  factor  A.  We  can  therefore  define  the  scale-invariant 
parameter 


(6.39) 


Now,  substituting,  d  =  y/rjpEb  and  a 
is  given  by 


He, ML  —  Q 


=  No/2  into  (6.34),  we  obtain  that  the  ML  performance 


(6.40) 


Two  important  observations  follow. 

Performance  depends  on  signal-to-noise  ratio:  We  observe  from  (6.40)  that  the  perfor¬ 
mance  depends  on  the  ratio  Eb/No,  rather  than  separately  on  the  signal  and  noise  strengths. 

Power  efficiency:  For  fixed  Eb/No,  the  performance  is  better  for  a  signaling  scheme  that  has  a 
higher  value  oi  rjp.  We  therefore  use  the  term  power  efficiency  for  rjp  =  ^. 

Let  us  now  compute  the  performance  of  some  common  binary  signaling  schemes  in  terms  of 
Eb/No,  using  (6.40).  Since  inner  products  (and  hence  energies  and  distances)  are  preserved  in 
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signal  space,  we  can  compnte  rjp  for  each  scheme  nsing  the  signal  space  representations  depicted 
in  Fignre  6.19.  The  absolnte  scale  of  the  signals  is  irrelevant,  since  the  performance  depends  on 
the  signaling  scheme  only  throngh  the  scale-invariant  parameter  rjp  =  cP / Ej,.  We  can  therefore 
choose  any  convenient  scaling  for  the  signal  space  representation  for  a  modnlation  scheme. 


1 


So 


0 


1 


4 


0 


1 


^ ^ 

0  1 


On-off  keying 


Antipodal  signaling  Equal  energy,  orthogonal  signaling 


Fignre  6.19:  Signal  space  representations  with  conveniently  chosen  scaling  for  three  binary  sig¬ 
naling  schemes. 


On-off  keying:  Here  Si{t)  =  s{t)  and  So{t)  =  0.  As  shown  in  Fignre  6.19,  the  signal  space  is 
one-dimensional.  For  the  scaling  in  the  hgnre,  we  have  d  =  1  and  Ef,  =  |(1^  -|-  0^)  =  so  that 


Vp 


Eb 


2. 


Snbstitnting  into  (6.40),  we  obtain  Pe,ML  = 


Antipodal  signaling:  Here  Si{t)  =  — So(t),  leading  again  to  a  one-dimensional  signal  space 
representation.  One  possible  realization  of  antipodal  signaling  is  BPSK,  discnssed  in  the  previons 
chapter.  For  the  scaling  chosen,  d  =  2  and  Ei,  =  +  (—1)^)  =  1,  which  gives  rjp  =  ^  =  A. 


Snbstitnting  into  (6.40),  we  obtain  Pe,ML  =  Q 


No 


Equal-energy,  orthogonal  signaling:  Here  Si  and  Sq  are  orthogonal,  with  ||si|p  =  ||so|p. 
This  is  a  two-dimensional  signal  space.  As  discnssed  in  the  previons  chapter,  possible  realizations 
of  orthogonal  signaling  inclnde  FSK  and  Walsh- Hadamard  codes.  From  Fignre  6.19,  we  have 


d  =  y/2  and  Eb  =  1,  so  that  rjp  =  ^  =  2.  This  gives  Pe,ML  = 


Thns,  on-off  keying  (which  is  orthogonal  signaling  with  nneqnal  energies)  and  eqnal-energy  or¬ 
thogonal  signaling  have  the  same  power  efficiency,  while  the  power  efficiency  of  antipodal  signaling 
is  a  factor  of  two  (i.e.,  3  dB)  better. 


In  plots  of  error  probability  versns  SNR,  we  typically  express  error  probability  on  a  log  scale  (in 
order  to  captnre  its  rapid  decay  with  SNR)  and  to  express  SNR  in  decibels  (in  order  to  span  a 
large  range).  We  provide  snch  a  plot  for  antipodal  and  orthogonal  signaling  in  Fignre  6.20. 


6.3.3  M-ary  signaling:  scale-invariance  and  SNR 

We  tnrn  now  to  M-ary  signaling  with  M  >  2,  modeled  as  the  following  hypothesis  testing 
problem. 

Hi  :  y{t)  =  Si{t)  +  n{t),  i  =  0, 1, ...,  M  -  1 
for  which  the  ML  rnle  has  been  derived  to  be 

duLiv)  =  arg  maxo<i<^_i  Zi 

with  decision  statistics 

Zi  =  {y,Si)  - -\\si\‘^  ,  i  =  0, 1,  ...,M  -  1 
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Figure  6.20:  Error  probability  versus  Eb/No  (dB)  for  binary  antipodal  and  orthogonal  signaling 
schemes. 


and  corresponding  decision  regions 

Ti  =  {y  :  Zi  >  Zj  for  all  j  i}  ,  i  =  0, 1, ...,  M  —  1  (6.41) 

Before  doing  detailed  computations,  let  us  discuss  some  general  properties  that  greatly  simplify 
the  framework  for  performance  analysis. 

Scale  Invariance:  For  binary  signaling,  we  have  observed  through  explicit  computation  of 
the  error  probability  that  performance  depends  only  on  signal-to-noise  ratio  {Ei,/Nq)  and  the 
geometry  of  the  signal  set  (which  determines  the  power  efficiency  (P/Ef,).  Actually,  we  can  make 
such  statements  in  great  generality  for  M-ary  signaling  without  explicit  computations.  First,  let 
us  note  that  the  performance  of  an  optimal  receiver  does  not  change  if  we  scale  both  signal  and 
noise  by  the  same  factor.  Specihcally,  optimal  reception  for  the  model 

Hi  :  y{t)  =  Asi{t)  +  An{t),  i  =  0, 1, ...,  M  —  1  (6.42) 

does  not  depend  on  A.  This  is  inferred  from  the  following  general  observation:  the  performance 
of  an  optimal  receiver  is  unchanged  when  we  pass  the  observation  through  an  invertible  transfor¬ 
mation.  Specihcally,  suppose  z{t)  =  E{y{t))  is  obtained  by  passing  y{t)  through  an  invertible 
transformation  E .  If  the  optimal  receiver  for  2;  does  better  than  the  optimal  receiver  for  |/,  then 
we  could  apply  E  to  y  to  get  z,  then  do  optimal  reception  for  z.  This  would  perform  better 
than  the  optimal  receiver  for  ?/,  which  is  a  contradiction.  Similarly,  if  the  optimal  receiver  for  y 
does  better  than  the  optimal  receiver  for  z,  then  we  could  apply  E~^  to  2;  to  get  y,  and  then  do 
optimal  reception  for  y  to  perform  better  than  the  optimal  receiver  for  2,  again  a  contradiction, 
if  the  optimal  receiver  for  y  does  better  than  the  optimal  receiver  for  f{y). 

The  preceding  argument  implies  that  performance  depends  only  on  the  signal-to-noise  ratio, 
once  we  have  hxed  the  signal  constellation.  Let  us  now  hgure  out  what  properties  of  the  signal 
constellation  are  relevant  in  determining  performance  For  M  =  2,  we  have  seen  that  all  that 
matters  is  the  scale-invariant  quantity  d? / E^.  What  are  the  analogous  quantities  for  M  >  2?  To 
determine  these,  let  us  consider  the  conditional  error  probabilities  for  the  ML  rule. 

Conditional  error  probability:  The  conditional  error  probability,  conditioned  on  Hi,  is  given 
by 

Pe\i  =  P[y  ^  Fj|i  sent]  =  P[Zi  <  Zj  for  some  j  ^  i\i  sent]  (6.43) 

While  computation  of  the  conditional  error  probability  in  closed  form  is  typically  not  feasible, 
we  can  actually  get  signihcant  insight  on  what  parameters  it  depends  on  by  examining  the 
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conditional  distributions  of  the  decision  statistics.  Since  y  =  Si  +  n  conditioned  on  Hi,  the 
decision  statistics  are  given  by 

Zj  =  {y,Sj)  -  ||sj||V2  =  {si  +  n,Sj)  -  ||sj||V2  =  {n,Sj)  +  {si,Sj)  -  \  \sj\\‘^/2  ,  0  <  j  <  M  -  1 

By  the  Gaussianity  of  n(t),  the  decision  statistics  {Zj}  are  jointly  Gaussian  (conditioned  on  Hi). 
Their  joint  distribution  is  therefore  completely  characterized  by  their  means  and  covariances. 
Since  the  noise  is  zero  mean,  we  obtain 


E[Zj\Hi]  =  {si,  Sj) 

Using  Theorem  6.2.1,  and  noting  that  covariance  is  unaffected  by  translation,  we  obtain  that 

cov(Zj,  Zk\Hi)  =  cov  ((n,  Sj),  (n,  Sk))  =  cr'^isj,  Sk) 

Thus,  conditioned  on  Hi,  the  joint  distribution  of  {Zj}  depends  only  on  the  noise  variance 
and  the  signal  inner  products  {(sj,  Sj),  I  <  i,j  <  M}.  Now  that  we  know  the  joint  distribution, 
we  can  in  principle  compute  the  conditional  error  probabilities  Pe\i.  In  practice,  this  is  often 
difficult,  and  we  often  resort  to  Monte  Carlo  simulations.  However,  what  we  have  found  out 
about  the  joint  distribution  can  now  be  used  to  rehne  our  concepts  of  scale-invariance. 

Performance  only  depends  on  normalized  inner  products:  Let  us  replace  Zj  by  Zj/a"^. 
Clearly,  since  we  are  simply  picking  the  maximum  among  the  decision  statistics,  scaling  by  a 
common  factor  does  not  change  the  decision  (and  hence  the  performance).  However,  we  now 
obtain  that 

and 

f  Zj  I  rr  \  ^  /  ry  ry  \  jj  \ 

COV  [  ^,—\Hi]  =  —cov{Zj,  Zk\Hi)  =  — 

J  (j^ 

Thus,  the  joint  distribution  of  the  normalized  decision  statistics  {Zj/a"^},  conditioned  on  any 

of  the  hypotheses,  depends  only  on  the  normalized  inner  products  <  ’i,j  <  M}.  Of 

course,  this  means  that  the  performance  also  depends  only  on  these  normalized  inner  products. 

Let  us  now  carry  these  arguments  further,  still  without  any  explicit  computations.  We  dehne 
energy  per  symbol  and  energy  per  bit  for  M-ary  signaling  as  follows. 

Energy  per  symbol,  Egi  For  M-ary  signaling  with  equal  priors,  the  energy  per  symbol  Eg  is 
given  by 


Energy  per  bit,  Ef,:  Since  M-ary  signaling  conveys  log2  M  bits/symbol,  the  energy  per  bit  is 
given  by 


Eb 


Eg 

logsM 


If  all  signals  in  a  M-ary  constellation  are  scaled  up  by  a  factor  A,  then  Eg  and  Ej,  get  scaled 
up  by  A"^,  as  do  all  inner  products  {(sj,Sj)}.  Thus,  we  can  dehne  scale-invariant  inner  products 

depend  only  on  the  shape  of  the  signal  constellation.  Indeed,  we  can  define  the 

shape  of  a  constellation  as  these  scale-invariant  inner  products.  Setting  =  No/2,  we  can  now 
write  the  normalized  inner  products  determining  performance  as  follows: 


{^iy  ^j)  _  i^iy  ^j)  ‘^Eb 
~  E,  No 


(6.44) 
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We  can  now  make  the  following  statement. 


Performance  depends  only  on  Ef,/NQ  and  constellation  shape  (as  specified  by  the 
scale-invariant  inner  products):  We  have  shown  that  the  performance  depends  only  on  the 

normalized  inner  products  From  (6.44),  we  see  that  these  in  turn  depend  only  on  E^/Nq 


and  the  scale-invariant  inner  products 


Eb  /  • 


The  latter  depend  only  on  the  shape  of  the 


signal  constellation,  and  are  completely  independent  of  the  signal  and  noise  strengths.  What 
this  means  is  that  we  can  choose  any  convenient  scaling  that  we  want  for  the  signal  constellation 
when  investigating  its  performance,  as  long  as  we  keep  track  of  the  signal-to-noise  ratio.  We 
illustrate  this  via  an  example  where  we  determine  the  error  probability  by  simulation. 


Example  6.3.1  (Using  scale-invariance  in  error  probability  simulations):  Suppose  that 
we  wish  to  estimate  the  error  probability  for  8PSK  by  simulation.  The  signal  points  lie  in  a  2- 
dimensional  space,  and  we  can  scale  them  to  lie  on  a  circle  of  unit  radius,  so  that  the  constellation 
is  given  by  .4,  =  {(cos  0,  sin  0)^  :  9  =  kTi/i,  k  =  0,1, ...,  7}.  The  energy  per  symbol  Eg  =  1  for 
this  scaling,  so  that  Ef,  =  Es/\og28  =  1/3.  We  therefore  have  Eb/No  =  l/(3A^"o)  =  l/(6cr^),  so 
that  the  noise  variance  per  dimension  can  be  set  to 


Q{Eb/No) 

Typically,  Eb/N^  is  specihed  in  dB,  so  we  need  to  convert  it  to  the  “raw”  Eb/N^.  We  now  have 
a  simulation  consisting  of  the  following  steps,  repeated  over  multiple  symbol  transmissions: 

Step  1:  Choose  a  symbol  s  at  random  from  A.  For  this  symmetric  constellation,  we  can  actually 
keep  sending  the  same  symbol  in  order  to  compute  the  performance  of  the  ML  rule,  since  the 
conditional  error  probabilities  are  all  equal.  For  example,  set  s  =  (1,  0)^. 

Step  2:  Generate  two  i.i.d.  iV(0, 1)  random  variables  Uc  and  Ug-  The  I  and  Q  noises  can  now  be 
set  as  Nc  =  crlJc  and  Ng  =  aUg,  so  that  N  =  (iV/,  Ng)'^. 

Step  3:  Set  the  received  vector  y  =  s  +  N. 

Step  4-  Compute  the  ML  decision  arg  max,  (y,  Sj)  (the  energy  terms  can  be  dropped,  since  the 
signals  are  of  equal  energy)  or  arg  min^  ||y  —  s/p. 

Step  5:  If  there  is  an  error,  increment  the  error  count. 

The  error  probability  is  estimated  as  the  error  count,  divided  by  the  number  of  symbols  trans¬ 
mitted.  We  repeat  this  simulation  over  a  range  of  Eb/N^,  and  typically  plot  the  error  probability 
on  a  log  scale  versus  Eb/No  in  dB. 


These  steps  are  carried  out  in  the  following  code  fragment,  which  generates  Figure  6.21  comparing 
a  simulation-based  estimate  of  the  error  probability  for  8PSK  against  the  intelligent  union  bound, 
an  analytical  estimate  that  we  develop  shortly.  The  analytical  estimate  requires  very  little 
computation  (evaluation  of  a  single  Q  function),  but  its  agreement  with  simulations  is  excellent. 
As  we  shall  see,  developing  such  analytical  estimates  also  gives  us  insight  into  how  errors  are 
most  likely  to  occur  for  M-ary  signaling  in  AWGN. 

The  code  fragment  is  written  for  transparency  rather  than  computational  efficiency.  The  code 
contains  an  outer  for- loop  for  varying  SNR,  and  an  inner  for-loop  for  computing  minimum  dis¬ 
tances  for  the  symbols  sent  at  each  SNR.  The  inner  loop  can  be  avoided  and  the  program  sped  up 
considerably  by  computing  all  minimum  distances  for  all  symbols  at  once  using  matrix  operations 
(try  it!).  We  use  a  less  efficient  program  here  to  make  the  operations  easy  to  understand. 

Code  Fragment  6.3.1  (Simulation  of  8PSK  performance  in  AWGN) 

7„generate  8PSK  constellation  as  complex  numbers 
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Figure  6.21:  Error  probability  for  8PSK. 


a=cumsuni(ones(8, 1))-1 ; 
constellation  =  exp(i*2*pi . *a/8) ; 
y„number  of  symbols  in  simulation 
nsymbols  =  20000; 
ebnodb  =  0:0.1:10; 
number_snrs  =  length (ebnodb) ; 
perr_estimate  =  zeros (number_snrs , 1) ; 
for  k=l  :number_snrs ,  °/oSNR  for  loop 
ebnodb_now  =  ebnodb (k); 
ebno=10~ (ebnodb_now/10) ; 
sigma=sqrt(l/(6*ebno)) ; 

y„send  first  symbol  without  loss  of  generality,  add  2d  Gaussian  noise 
received  =  1  +  sigma*randn(nsymbols , l)+j *sigma*randn(nsymbols , 1) ; 
decisions=zeros (nsymbols , 1) ; 

for  n=l :nsymbols ,  ySymbol  for  loop  (can/should  be  avoided  for  fast  implementation) 
distances  =  abs (received(n) -constellation) ; 

[min_dist ,decisions(n)]  =  min(distances) ; 

end 

errors  =  (decisions  ~=  1) ; 
perr_estimate(k)  =  sum (errors) /nsymbols; 
end 

semilogy(ebnodb,perr_estimate) ; 
hold  on; 

'/.COMPARE  WITH  INTELLIGENT  UNION  BOUND 
etaP  =  6-3*sqrt(2);  '/.power  efficiency 
Ndmin  =  2;'/.  number  of  nearest  neighbors 
ebno  =  10 . ~ (ebnodb/10) ; 

perr_union  =  Ndmin*q_function(sqrt (etaP*ebno/2) ) ; 
semilogy (ebnodb, perr_union, ’ :t’) ; 
xlabel(’Eb/N0  (dB) O ; 
ylabeK ’Symbol  error  probability’); 

legend ( ’Simulation’ , ’Intelligent  Union  Bound’ , ’Location’ , ’NorthEast’) ; 
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6.3.4  Performance  analysis  for  M-ary  signaling 

We  begin  by  computing  the  error  probability  for  QPSK,  for  which  we  can  get  simple  expressions 
for  the  error  probability  in  terms  of  the  Q  function.  We  then  discuss  why  exact  performance  anal¬ 
ysis  can  be  more  complicated  in  general,  motivating  the  need  for  the  bounds  and  approximations 
we  develop  in  this  section. 


Ns  t 


So 

• 

• - 

S3* 

•  c 

^2 

A 

d 


V 


Figure  6.22:  If  sq  is  sent,  an  error  occurs  if  W  or  Ng  is  negative  enough  to  make  the  received 
vector  fall  out  of  the  hrst  quadrant. 

Exact  analysis  for  QPSK:  Let  us  hnd  Pe|0)  fhe  conditional  error  probability  for  the  ML  rule 
conditioned  on  sq  being  sent.  For  the  scaling  shown  in  Figure  6.22, 


and  the  two-dimensional  received  vector  is  given  by 

9 = + (Af.w.y = f  ) 


where  N^,  W  are  i.i.d.  iV(0,(T^)  random  variables,  corresponding  to  the  projections  of  WGN 
along  the  I  and  Q  axes,  respectively.  An  error  occurs  if  the  noise  moves  the  observation  out 
of  the  positive  quadrant,  which  is  the  decision  region  for  sq.  This  happens  if  W  +  f  <  0  or 
-|-  I  <  0.  We  can  therefore  write 


Fe|o  =  <  0  or  Ng+^  <  0]  =  P[Nc+^  <  0]+P[W+^  <  0]-P[W+^  <  0  and  <  0] 

(6.45) 


But 


P|iv.  +  |<oi  =  p|iv,<Al  =  *(-L)=Q( 


d 


This  is  also  equal  to  P[W  +  f  <  0],  since  Nc,  W  are  identically  distributed.  Furthermore,  since 
Ac,  Ng  are  independent,  we  have 


P[Ac  +  ^  <  0  and  A,  +  ^  <  0]  =  P[Ac  +  ^  <  0]P[A,  +  ^  <  0]  = 

Substituting  these  expressions  into  (6.45),  we  obtain  that 

d 


Q 


A 


Pe|i  =  2Q(-)-Q^(- 


2a 


(6.46) 
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By  symmetry,  the  conditional  probabilities  Pe\i  are  eqnal  for  all  i,  which  implies  that  the  average 
error  probability  is  also  given  by  the  expression  above.  We  now  express  the  error  probability  in 
terms  of  the  scale-invariant  parameter  ^  and  Eb/No,  nsing  the  relation 


d  _  fW 
^  “  y  Yb\l  ^ 

The  energy  per  symbol  is  given  by 


which  implies  that  the  energy  per  bit  is 

E  _d^ 

^  logs  M  log2  4  4 


This  yields  ^ 


4,  and  hence  ^ 


Snbstitnting  into  (6.46),  we  obtain 


Fignre  6.23:  The  noise  random  variables  Ni,  iVs,  which  can  drive  the  received  vector  ontside 
the  decision  region  Tq  are  correlated,  which  makes  it  difficnlt  to  find  an  exact  expression  for  Pe|o 


Why  exact  analysis  can  be  difficult:  Let  ns  first  nnderstand  why  we  conld  hnd  a  simple 
expression  for  the  error  probability  for  QPSK.  The  decision  regions  are  bonnded  by  the  I  and 
Q  axes.  The  noise  random  variable  W  can  canse  crossing  of  the  Q  axis,  while  Ng  can  canse 
crossing  of  the  I  axis.  Since  these  two  random  variables  are  independent,  the  probability  that  at 
least  one  of  these  noise  random  variables  canses  a  bonndary  crossing  becomes  easy  to  compnte. 
Fignre  6.23  shows  an  example  where  this  is  not  possible.  In  the  fignre,  we  see  that  the  decision 
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region  Fq  is  bounded  by  three  lines  (in  general,  these  would  be  n  —  1- dimensional  hyperplanes 
in  n-dimensional  signal  space).  An  error  occurs  if  we  cross  any  of  these  lines,  starting  from  sq- 
In  order  to  cross  the  line  between  sq  and  Sj,  the  noise  random  variable  Aj  must  be  bigger  than 
||si  —  Sol  1/2,  i  =  1,  2,  3  (as  we  saw  in  Figures  6.17  and  6.18,  only  the  noise  component  orthogonal 
to  a  hyperplane  determines  whether  we  cross  it).  Thus,  the  conditional  error  probability  can  be 
written  as 


Fe|o  =  ^[^1  >  Iki  -  5011/2  or  Aa  >  ||s2  -  So||/2  or  Ag  >  ||s3  -  So||/2]  (6.48) 

The  random  variables  Ai,A2,A3  are,  of  course,  jointly  Gaussian,  since  each  is  a  projection 
of  WGN  along  a  direction.  Each  of  them  is  an  A(0,(T^)  random  variable;  that  is,  they  are 
identically  distributed.  However,  they  are  not  independent,  since  they  are  projections  of  WGN 
along  directions  that  are  not  orthogonal  to  each  other.  Thus,  we  cannot  break  down  the  preceding 
expression  into  probabilities  in  terms  of  the  individual  random  variables  Ai,  A2,  A3,  unlike  what 
we  did  for  QPSK  (where  Ac,  A^  were  independent).  However,  we  can  still  hnd  a  simple  upper 
bound  on  the  conditional  error  probability  using  the  union  bound,  as  follows. 

Union  Bound:  The  probability  of  a  union  of  events  is  upper  bounded  by  the  sum  of  the 
probabilities  of  the  events. 

F[Ai  or  A2  or  ...  or  =  P  [Ai  U  A2...  U  An]  <  F[Ai]  +  F[A2]  +  ...  +  F[A„]  (6.49) 

Applying  (6.49)  to  (6.48),  we  obtain  that,  for  the  scenario  depicted  in  Figure  6.23,  the  conditional 
error  probability  can  be  upper  bounded  as  follows: 

Pe\0  ^  P[Nl  >  I  kl  —  Sol  1/2]  +  P[A2  >  I  |S2  —  Sol  1/2]  +  P[A3  >  I  |S3  —  Sol  1/2] 

j  +  g  j  +  Q  j  (6.50) 

Thus,  the  conditional  error  probability  is  upper  bounded  by  a  sum  of  probabilities,  each  of  which 
corresponds  to  the  error  probability  for  a  binary  decision;  so  versus  si,  sq  versus  S2,  and  so  versus 
S3.  This  approach  applies  in  great  generality,  as  we  show  next. 

Union  Bound  and  variants:  Pictures  such  as  the  one  in  Figure  6.23  typically  cannot  be 
drawn  when  the  signal  space  dimension  is  high.  However,  we  can  still  hnd  union  bounds  on  error 
probabilities,  as  long  as  we  can  enumerate  all  the  signals  in  the  constellation.  To  do  this,  let  us 
rewrite  (6.43),  the  conditional  error  probability,  conditioned  on  Hi,  as  a  union  of  M  —  1  events 
as  follows: 

Pe|i  =  P[\Jj^i{Zi  <  Zj]\i  sent]] 

where  {Zj}  are  the  decision  statistics.  Using  the  union  bound  (6.49),  we  obtain 

Pe\i  <  ''^^P[^i  <  sent]]  (6.51) 

But  the  jth  term  on  the  right-hand  side  above  is  simply  the  error  probability  of  ML  reception 
for  binary  hypothesis  testing  between  the  signals  s,  and  sj.  From  the  results  of  Section  6.3.2,  we 
therefore  obtain  the  following  pairwise  error  probability: 

P[Zi  <  Zj\i  sent]]  =  Q 

\  2cr 

Substituting  into  (6.51),  we  obtain  upper  bounds  on  the  conditional  error  probabilities  and  the 
average  error  probability  as  follows. 
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Union  Bonnd  on  conditional  error  probabilities:  The  conditional  error  proba¬ 
bilities  for  the  ML  rule  are  bounded  as 


Pe\i  < 


(6.52) 


where  dij  =  ||si  —  Sj||  is  the  distance  between  signals  s*  and  sj. 

Union  bound  on  average  error  probability:  Averaging  the  conditional  error  using 
the  prior  probabilities  gives  an  upper  bound  on  the  average  error  probability  as  follows: 


=  T^iPeli  <  Q 

i  i  j^i 


i  j^i 


2a 


(6.53) 


We  can  now  rewrite  the  union  bound  in  terms  of  Eb/No  and  the  scale-invariant  squared 
distances  -rr  as  follows: 


Pe\i  < 


(6.54) 


Pe  ^  ^  '^iPe\i  —  ^  ^  ^  ^  Q 

i  i 


(6.55) 


Applying  the  union  bound  to  Figure  6.23,  we  obtain 


Pe\0  <  Q 


+  Q 


^3  ~  Sq 
2a 


+  Q 


Notice  that  this  answer  is  different  from  the  one  we  had  in  (6.50).  This  is  because  the  fourth 
term  corresponds  to  the  signal  S4,  which  is  “too  far  away”  from  sq  to  play  a  role  in  determining 
the  decision  region  Tq.  Thus,  when  we  do  have  a  more  detailed  geometric  understanding  of  the 
decision  regions,  we  can  do  better  than  the  generic  union  bound  (6.52)  and  get  a  tighter  bound, 
as  in  (6.50).  We  term  this  the  intelligent  union  bound,  and  give  a  general  formulation  in  the 
following. 

Denote  by  Nmi{i)  the  indices  of  the  set  of  neighbors  of  signal  Sj  (we  exclude  i  from  Nmi{i)  by 
dehnition)  that  characterize  the  ML  decision  region  T*.  That  is,  the  half-planes  that  we  intersect 
to  obtain  T,  correspond  to  the  perpendicular  bisectors  of  lines  joining  s,  and  Sj,  j  G  Nmiii)-  For 
example,  in  Figure  6.23,  A'mi(O)  =  2,  3};  S4  is  excluded  from  this  set,  since  it  does  not  play  a 

role  in  determining  Fq.  The  decision  region  in  (6.41)  can  now  be  expressed  as 


Ti  =  {y  :  6ML{y)  =i}  =  {y.  Zi>  Zj  for  all  j  e  (6.56) 

We  can  now  say  the  following:  y  falls  outside  Fj  if  and  only  if  Zi  <  Z,  j  for  some  j  G  Nmi{i)-  We 
can  therefore  write 


Pe\i  =  P[y  ^  Fj|i  sent]  =  P[Zi  <  Zj  for  some  j  G  Nmi{i)\i  sent]  (6.57) 

and  from  there,  following  the  same  steps  as  in  the  union  bound,  get  a  tighter  bound,  which  we 
express  as  follows. 
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Intelligent  Union  Bound:  A  better  bound  on  Pe|i  is  obtained  by  considering  only 
the  neighbors  of  Sj  that  determine  its  ML  decision  region,  as  follows: 


Pe\^  <  ^ 

j  G  Nmi  (i) 


2a 


In  terms  of  Eh/No,  we  get 


Pe\i  <  Q 

j  G  Nmi  (*) 


(6.58) 


(6.59) 


(the  bound  on  the  average  error  probability  Pg  is  computed  as  before  by  averaging  the 
bounds  on  Pe|i  using  the  priors). 


Union  Bound  for  QPSK:  For  QPSK,  we  infer  from  Figure  6.22  that  the  union  bound  for  Pep 
is  given  by 


Pe  —  Pe\0  <  Q 


doi 


+  Q 


do2 


+  Q 


do3 


Using  -^  =  4,  we  obtain  the  union  bound  in  terms  of  Ei,/Nq  to  be 


Pe  <2Q 


QPSK  union  bound 


(6.60) 


For  moderately  large  Ei,/Nq,  the  dominant  term  in  terms  of  the  decay  of  the  error  probability  is 
the  hrst  one,  since  Q{x)  falls  off  rapidly  as  x  gets  large.  Thus,  while  the  union  bound  (6.60)  is 
larger  than  the  exact  error  probability  (6.47),  as  it  must  be,  it  gets  the  multiplicity  and  argument 
of  the  dominant  term  right.  Tightening  the  analysis  using  the  intelligent  union  bound,  we  get 


Pe\o  <  Q 


QPSK  intelligent  union  bound  (6.61) 


since  Nmi{0)  =  {1,2}  (the  decision  region  for  sq  is  determined  by  the  neighbors  Si  and  S2). 

Another  common  approach  for  getting  a  better  (and  quicker  to  compute)  estimate  than  the 
original  union  bound  is  the  nearest  neighbors  approximation.  This  is  a  loose  term  employed  to 
describe  a  number  of  different  methods  for  pruning  the  terms  in  the  summation  (6.52).  Most 
commonly,  it  refers  to  regular  signal  sets  in  which  each  signal  point  has  a  number  of  nearest 
neighbors  at  distance  d^in  from  it,  where  dmin  =  —  Sj||.  Letting  Nd^.^{i)  denote  the 

number  of  nearest  neighbors  of  Sj,  we  obtain  the  following  approximation. 

Nearest  Neighbors  Approximation 

(6-62) 

Averaging  over  i,  we  obtain  that 

U  «  (6.63) 
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where  denotes  the  average  number  of  nearest  neighbors  for  a  signal  point.  The  rationale 

for  the  nearest  neighbors  approximation  is  that,  since  Q{x)  decays  rapidly,  Q{x)  ~  e  as 
X  gets  large,  the  terms  in  the  union  bound  corresponding  to  the  smallest  arguments  for  the  Q 
function  dominate  at  high  SNR. 

The  corresponding  formulas  as  a  function  of  scale-invariant  quantities  and  Eh /No  are: 


e\i 


N, 


m 


(6.64) 


It  is  also  worth  explicitly  writing  down  an  expression  for  the  average  error  probability,  averaging 
the  preceding  over  i: 


Pe 


Nd  ■  Q 

'-‘'min  ^ 


(6.65) 


where 


N, 


1 

M 


M 

YNd  .  ii) 

/  j  '^m%n  \  / 


2=1 

is  the  average  number  of  nearest  neighbors  for  the  signal  points  in  the  constellation. 
For  QPSK,  we  have  from  Figure  6.22  that 


Nd  ■  (i)  =  2  =  Nd 

'-‘'mm  \  /  '-‘'I 


and 


yielding 


4 


Fe~2g 


In  this  case,  the  nearest  neighbors  approximation  coincides  with  the  intelligent  union  bound 
(6.61).  This  happens  because  the  ML  decision  region  for  each  signal  point  is  determined  by  its 
nearest  neighbors  for  QPSK.  Indeed,  the  latter  property  holds  for  many  regular  constellations, 
including  all  of  the  PSK  and  QAM  constellations  whose  ML  decision  regions  are  depicted  in 
Figure  6.16. 

Power  Efficiency:  While  exact  performance  analysis  for  M-ary  signaling  can  be  computation¬ 
ally  demanding,  we  have  now  obtained  simple  enough  estimates  that  we  can  define  concepts  such 
as  power  efficiency,  analogous  to  the  development  for  binary  signaling.  In  particular,  comparing 
the  nearest  neighbors  approximation  (6.63)  with  the  error  probability  for  binary  signaling  (6.40), 
we  define  in  analogy  the  power  efficiency  of  an  M-ary  signaling  scheme  as 


Vp 


^min 


We  can  rewrite  the  nearest  neighbors  approximation  as 


Pe-Nd  .  Q 

^  '-‘'mm  ^ 


(6.66) 


(6.67) 
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Figure  6.24:  ML  decision  regions  for  16QAM  with  scaling  chosen  for  convenience  in  computing 
power  efficiency. 


Since  the  argument  of  the  Q  function  in  (6.67)  plays  a  bigger  role  than  the  multiplicity  for 

moderately  large  SNR,  rjp  offers  a  means  of  quickly  comparing  the  power  efficiency  of  different 
signaling  constellations,  as  well  as  for  determining  the  dependence  of  performance  on  Eb/No- 

Performance  analysis  for  16QAM:  We  now  apply  the  preceding  performance  analysis  to  the 
16QAM  constellation  depicted  in  Figure  6.24,  where  we  have  chosen  a  convenient  scale  for  the 
constellation.  We  now  compute  the  nearest  neighbors  approximation,  which  coincides  with  the 
intelligent  union  bound,  since  the  ML  decision  regions  are  determined  by  the  nearest  neighbors. 
Noting  that  the  number  of  nearest  neighbors  is  four  for  the  four  innermost  signal  points,  two  for 
the  four  outermost  signal  points,  and  three  for  the  remaining  eight  signal  points,  we  obtain  upon 
averaging 

Nd  .  =3  (6.68) 

It  remains  to  compute  the  power  efficiency  rjp  and  apply  (6.67).  We  had  done  this  in  the  preview 
in  Chapter  4,  but  we  repeat  it  here.  For  the  scaling  shown,  we  have  d^in  =  2.  The  energy  per 
symbol  is  obtained  as  follows: 


Eg  =  average  energy  of  I  component  +  average  energy  of  Q  component 
=  2 (average  energy  of  I  component) 


by  symmetry. 


Since  the  I  component  is  equally  likely  to  take  the  four  values  ±1  and  ±3,  we  have 


average  energy  of  I  component 


-(i^  +  a^) 


5 


and 

We  therefore  obtain 


Eg  =  10 


Eb  = 


Eg 


10 


logs  M  log2 16  2 

The  power  efficiency  is  therefore  given  by 


dl 


Vp  = 


Eb 


-  S 

2  ^ 


(6.69) 


Substituting  (6.68)  and  (6.69)  into  (6.67),  we  obtain  that 


Pe(16QAM)  ^  3Q 


(6.70) 


as  the  nearest  neighbors  approximation  and  intelligent  union  bound  for  16QAM.  The  bandwidth 
efficiency  for  16QAM  is  4  bits/2  dimensions,  which  is  twice  that  of  QPSK,  whose  bandwidth 
efficiency  is  2  bits/2  dimensions.  It  is  not  surprising,  therefore,  that  the  power  efficiency  of 
16QAM  (?7p  =  1.6)  is  smaller  than  that  of  QPSK  (r^p  =  4).  We  often  encounter  such  tradeoffs 
between  power  and  bandwidth  efficiency  in  the  design  of  communication  systems,  including  when 
the  signaling  waveforms  considered  are  sophisticated  codes  that  are  constructed  from  multiple 
symbols  drawn  from  constellations  such  as  PSK  and  QAM. 


Figure  6.25:  Symbol  error  probabilities  for  QPSK  and  16QAM. 

Figure  6.25  shows  the  symbol  error  probabilities  for  QPSK  and  16QAM,  comparing  the  intelligent 
union  bounds  (which  coincide  with  nearest  neighbors  approximations)  with  exact  results.  The 
exact  computations  for  16QAM  use  the  closed  form  expression  (6.70)  derived  in  Problem  6.21.  We 
see  that  the  exact  error  probability  and  intelligent  union  bound  are  virtually  indistinguishable. 
The  power  efficiencies  of  the  constellations  (which  depend  on  the  argument  of  the  Q  function) 

accurately  predict  the  distance  between  the  curves:  which  equals  about  4  dB. 

From  Figure  6.25,  we  see  that  the  distance  between  the  QPSK  and  16QAM  curves  at  small  error 
probabilities  (high  SNR)  is  indeed  about  4  dB. 


Decision  boundary 

Figure  6.26:  Performance  analysis  for  BPSK  with  phase  offset. 


The  performance  analysis  techniques  developed  here  can  also  be  applied  to  suboptimal  receivers. 
Suppose,  for  example,  that  the  receiver  LO  in  a  BPSK  system  is  offset  from  the  incoming  carrier 
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by  a  phase  shift  9,  but  that  the  receiver  uses  decision  regions  corresponding  to  no  phase  offset. 
The  signal  space  picture  is  now  as  in  Figure  6.26.  The  error  probability  is  now  given  by 


Pe  —  Pe\0  —  Pe\l 


Q 


(  2E,\ 

Eb  No  J 


For  the  scaling  shown,  D  =  cos  9  and  Eb  =  1,  which  gives 


Pe  =  Q 


'  2Eb  cos^  9 


No 


so  that  there  is  a  loss  of  10  log^g  cos^  9  dB  in  performance  due  to  the  phase  offset  (e.g.  9  =  10° 
leads  to  a  loss  of  0.13  dB,  while  9  =  30°  leads  to  a  loss  of  1.25  dB). 


6.3.5  Performance  analysis  for  M-ary  orthogonal  modulation 


So  far,  our  examples  have  focused  on  two-dimensional  modulation,  which  is  what  we  use  when 
our  primary  concern  is  bandwidth  efficiency.  We  now  turn  our  attention  to  equal  energy,  M- 
ary  orthogonal  signaling,  which,  as  we  have  mentioned  before,  lies  at  the  other  extreme  of  the 
power-bandwidth  tradeoff  space:  as  M  — )■  oo,  the  power  efficiency  reaches  the  highest  possible 
value  of  any  signaling  scheme  over  the  AWGN  channel,  while  the  bandwidth  efficiency  tends  to 
zero.  The  signal  space  is  M-dimensional  in  this  case,  but  we  can  actually  get  expressions  for 
the  probability  of  error  that  involve  a  single  integral  rather  than  M-dimensional  integrals,  by 
exploiting  the  orthogonality  of  the  signal  constellation. 

Let  us  hrst  quickly  derive  the  union  bound.  Without  loss  of  generality,  take  the  M  orthogonal 
signals  as  unit  vectors  along  the  M  axes  in  our  signal  space.  With  this  scaling,  we  have  |  |sj|  p  =  1, 
so  that  Eg  =  1  and  Eb  =  ibu'  signals  are  orthogonal,  the  squared  distance  between 

any  two  signals  is 

dj  =  Ik*  -  Sjik  =  Ikilk  +  Ikilk  -  2(si,Sj)  =  2Es  =  2  ,  j 
Thus,  drain  =  dij  {i  k  j)  and  the  power  efficiency 

i)F  =  %  =  21og2M 


The  union  bound,  intelligent  union  bound  and  nearest  neighbors  approximation  all  coincide,  and 
we  get 


Pe  =  Pe\i  <  ^  Q 


We  now  get  the  following  expression  in  terms  of  Eb/No- 

Union  bound  on  error  probability  for  M-ary  orthogonal  signaling 

P,  <(M-l)Q(^Si^)  (6.71) 

Exact  expressions:  By  symmetry,  the  error  probability  equals  the  conditional  error  probability, 
conditioned  on  any  one  of  the  hypotheses;  similarly,  the  probability  of  correct  decision  equals 
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the  probability  of  correct  decision  given  any  of  the  hypothesis.  Let  us  therefore  condition  on 
hypothesis  Hq  (i.e.,  that  sq  is  sent),  so  that  the  received  signal  y  =  sq  +  ti.  The  decision  statistics 

=  (so  +  n,  Si)  =  Es6oi  +  Ni  ,  i  =  0, 1, M  -  1 

where  {iVj  =  (n,  s*)}  are  jointly  Gaussian,  zero  mean,  with 

cov(7Vj,  Nj)  =  a‘^{si,  Sj)  =  a‘^Es5ij 

Thus,  Ni  ~  N{0,  a'^Eg)  are  i.i.d.  We  therefore  infer  that,  conditioned  on  sq  sent,  the  {Zj}  are 
conditionally  independent,  with  Zq  N{Es,  a’^Es),  and  Zj  ~  A^(0,  a'^Eg)  for  i  =  1, ...,  M  —  1. 

Let  us  now  express  the  decision  statistics  in  scale-invariant  terms,  by  replacing  Zj  by  This 

cr Y  h/s 

gives  Zq  ~  N{m,  1),  Zi, ...,  Zm-i  ~  ^(0, 1),  conditionally  independent,  where 


IE. 


m  = 


=  =  ^/^eJWo  =  ^2E,\og,M/No 


a\fWg  V 

The  conditional  probability  of  correct  reception  is  now  given  by 


Pc\o  =  P[Zi  <  Zo, ...,  Zm-1  <  Zo|iLo]  =  /  P[Zi  <  X, ...,  Zm-i  <  x\Zo  =  x,  Ho]pzo\Hoix\Ho)dx 

=  J  P[Zi  <  x\Ho]...P[Zm-i  <  x\Ho]pzo\Ho{x\HQ)dx 


where  we  have  used  the  conditional  independence  of  the  {Zi}.  Plugging  in  the  conditional 
distributions,  we  get  the  following  expression  for  the  probability  of  correct  reception. 

Probability  of  correct  reception  for  M-ary  orthogonal  signaling 

a  =  Pen  =  dx  (6.72) 

where  m  =  ^/2Eg/NQ  =  \/2Ei,  log2  M/Nq. 

The  probability  of  error  is,  of  course,  one  minus  the  preceding  expression.  But  for  small  error 
probabilities,  the  probability  of  correct  reception  is  close  to  one,  and  it  is  difficult  to  get  good 
estimates  of  the  error  probability  using  (6.72).  We  therefore  develop  an  expression  for  the  error 
probability  that  can  be  directly  computed,  as  follows: 


Pe\o  =  ^  P[Zj  =  maxiZi\Ho\  =  (M  -  l)P[Zi  =  max^ZjliPo] 
j^o 

where  we  have  used  symmetry.  Now, 


F[Zi  —  maXjZj|i7o]  —  P[Zo  <  Zi,  Z2  <  Zi, ...,  Zm-i  <  Zi|i7o] 

=  /  P[Zo  <x,Z2<  X, ...,  Zm-1  <  x\Zi  =  x,  HQ]pzpHo{x\HQ)dx 

=  J  F[Zo  <  x\Ho]P[Z2  <  x\Ho\...P[Zm-i  <  x\Ho\pz,\Ho{x\Ho)dx 


Plugging  in  the  conditional  distributions,  and  multiplying  by  M— 1,  gives  the  following  expression 
for  the  error  probability. 


Probability  of  error  for  M-ary  orthogonal  signaling 

F,  =  P,\i  =  (M  -  1)  /:^[<h(a:)]^-2  $(x  -  dx 


(6.73) 


where  m  = 
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Figure  6.27:  Symbol  error  probabilities  for  M-ary  orthogonal  signaling. 


Asymptotics  for  large  M:  The  error  probability  for  M-ary  orthogonal  signaling  exhibits  an 
interesting  thresholding  effect  as  M  gets  large: 


lim  Pg 

M^OO 


0,  t>ln2 

t<1^2 


(6.74) 


That  is,  by  letting  M  get  large,  we  can  get  arbitrarily  reliable  performance  as  long  as  Eb/No 
exceeds  -1.6  dB  (In  2  expressed  in  dB).  This  result  is  derived  in  one  of  the  problems.  Actually,  we 
can  show  using  the  tools  of  information  theory  that  this  is  the  best  we  can  do  over  the  AWGN 
channel  in  the  limit  of  bandwidth  efficiency  tending  to  zero.  That  is,  M-ary  orthogonal  signaling 
is  asymptotically  optimum  in  terms  of  power  efficiency. 

Figure  6.27  shows  the  probability  of  symbol  error  as  a  function  of  Eb/No  for  several  values  of  M. 
We  see  that  the  performance  is  quite  far  away  from  the  asymptotic  limit  of  -1.6  dB  (also  marked 
on  the  plot)  for  the  moderate  values  of  M  considered.  For  example,  the  Eb/No  required  for 
achieving  an  error  probability  of  10“®  for  M  =  16  is  more  than  9  dB  away  from  the  asymptotic 
limit. 


6.4  Bit  Error  Probability 

We  now  know  how  to  design  rules  for  deciding  which  of  M  signals  (or  symbols)  has  been  sent, 
and  how  to  estimate  the  performance  of  these  decision  rules.  Sending  one  of  M  signals  conveys 
m  =  log2  M  bits,  so  that  a  hard  decision  on  one  of  these  signals  actually  corresponds  to  hard 
decisions  on  m  bits.  In  this  section,  we  discuss  how  to  estimate  the  bit  error  probability,  or  the 
bit  error  rate  (BER),  as  it  is  often  called. 

QPSK  with  Gray  coding:  We  begin  with  the  example  of  QPSK,  with  the  bit  mapping  shown 
in  Figure  6.28.  This  bit  mapping  is  an  example  of  a  Gray  code,  in  which  the  bits  corresponding 
to  neighboring  symbols  differ  by  exactly  one  bit  (since  symbol  errors  are  most  likely  going  to 
occur  by  decoding  into  neighboring  decision  regions,  this  reduces  the  number  of  bit  errors).  Let 
us  denote  the  symbol  labels  as  6[1]6[2]  for  the  transmitted  symbol,  where  6[1]  and  b[2]  each  take 
values  0  and  1.  Letting  6[1]6[2]  denote  the  label  for  the  ML  symbol  decision,  the  probabilities  of 
bit  error  are  given  by  pi  =  T’[&[1]  7^  &[!]]  and  p2  =  F’[&[2]  7^  ^[2]].  The  average  probability  of  bit 
error,  which  we  wish  to  estimate,  is  given  by  pb  =  |(pi  +  ^2)-  Conditioned  on  00  being  sent,  the 
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Figure  6.28:  QPSK  with  Gray  coding. 


probability  of  making  an  error  on  6[1]  is  as  follows: 


P[b[l]  =  1|00  sent]  =  P[ML  decision  is  10  or  11|00  sent]  =  P[Nc  <  --]  =  Q{—)  =  Q  i  \  ) 

2  2cr  yy  No j 

where,  as  before,  we  have  expressed  the  result  in  terms  of  Eh/ No  using  the  power  efficiency 
■^  =  4.  We  also  note,  by  the  symmetry  of  the  constellation  and  the  bit  map,  that  the  conditional 
probability  of  error  of  6[1]  is  the  same,  regardless  of  which  symbol  we  condition  on.  Moreover, 
exactly  the  same  analysis  holds  for  b[2],  except  that  errors  are  caused  by  the  noise  random 
variable  Ng.  We  therefore  obtain  that 


Pb  =  Pi  =  P2  =  Q  (6.75) 

The  fact  that  this  expression  is  identical  to  the  bit  error  probability  for  binary  antipodal  signaling 
is  not  a  coincidence.  QPSK  with  Gray  coding  can  be  thought  of  as  two  independent  BPSK 
systems,  one  signaling  along  the  I  component,  and  the  other  along  the  Q  component. 

Gray  coding  is  particularly  useful  at  low  SNR  (e.g.,  for  heavily  coded  systems),  where  symbol 
errors  happen  more  often.  For  example,  in  a  coded  system,  we  would  pass  up  fewer  bit  errors  to 
the  decoder  for  the  same  number  of  symbol  errors.  We  define  it  in  general  as  follows. 

Gray  Coding:  Gonsider  a  2”-ary  constellation  in  which  each  point  is  represented  by  a  binary 
string  b  =  (5i,  ...,bn)-  The  bit  assigment  is  said  to  be  Gray  coded  if,  for  any  two  constellation 
points  b  and  b'  which  are  nearest  neighbors,  the  bit  representations  b  and  b'  differ  in  exactly 
one  bit  location. 

Nearest  neighbors  approximation  for  BER  with  Gray  coded  constellation:  Gonsider 
the  ith  bit  6*  in  an  n-bit  Gray  code  for  a  regular  constellation  with  minimum  distance  dmm-  For 
a  Gray  code,  there  is  at  most  one  nearest  neighbor  which  differs  in  the  ith  bit,  and  the  pairwise 
error  probability  of  decoding  to  that  neighbor  is  Q  We  therefore  have 


F(bit  error)  Q 


with  Gray  coding 


(6.76) 


where  rjp 


‘min 

Eb 


is  the  power  efficiency. 


10'- 


Figure  6.29:  BER  for  16QAM  and  16PSK  with  Gray  coding. 


Figure  6.29  shows  the  BER  of  16QAM  and  16PSK  with  Gray  coding,  comparing  the  nearest 
neighbors  approximation  with  exact  results  (obtained  analytically  for  16QAM,  and  by  simulation 
for  16PSK).  The  slight  pessimism  and  ease  of  computation  of  the  nearest  neighbors  approximation 
implies  that  it  is  an  excellent  tool  for  link  design. 

Gray  coding  may  not  always  be  possible.  Indeed,  for  an  arbitrary  set  of  M  =  2"  signals,  we  may 
not  understand  the  geometry  well  enough  to  assign  a  Gray  code.  In  general,  a  necessary  (but 
not  sufficient)  condition  for  an  n-bit  Gray  code  to  exist  is  that  the  number  of  nearest  neighbors 
for  any  signal  point  should  be  at  most  n. 

BER  for  orthogonal  modulation:  For  M  =  2"^-ary  equal  energy,  orthogonal  modulation, 
each  of  the  m  bits  split  the  signal  set  into  half.  By  the  symmetric  geometry  of  the  signal  set, 
any  of  the  M  —  1  wrong  symbols  are  equally  likely  to  be  chosen,  given  a  symbol  error,  and  ^  of 
these  will  correspond  to  error  in  a  given  bit.  We  therefore  have 

M 

P(bit  error)  =  — ^ —  P(symbol  error),  BER  for  M  —  ary  orthogonal  signaling  (6.77) 
M  —  1 

Note  that  Gray  coding  is  out  of  the  question  here,  since  there  are  only  m  bits  and  2”*  —  1 
neighbors,  all  at  the  same  distance. 


6.5  Link  Budget  Analysis 


We  have  seen  now  that  performance  over  the  AWGN  channel  depends  only  on  constellation  ge¬ 
ometry  and  Efj/Nt).  In  order  to  design  a  communication  link,  however,  we  must  relate  Ei,/Nq  to 
physical  parameters  such  as  transmit  power,  transmit  and  receive  antenna  gains,  range  and  the 
quality  of  the  receiver  circuitry.  Let  us  hrst  take  stock  of  what  we  know: 

(a)  Given  the  bit  rate  Rb  and  the  signal  constellation,  we  know  the  symbol  rate  (or  more  gen¬ 
erally,  the  number  of  modulation  degrees  of  freedom  required  per  unit  time),  and  hence  the 
minimum  Nyquist  bandwidth  Bmin-  We  can  then  factor  in  the  excess  bandwidth  a  dictated 
by  implementation  considerations  to  hnd  the  bandwidth  P  =  (1  -|-  a) Bmin  required.  (However, 
assuming  optimal  receiver  processing,  we  show  below  that  the  excess  bandwidth  does  not  affect 
the  link  budget.) 

(b)  Given  the  constellation  and  a  desired  bit  error  probability,  we  can  infer  the  Eb/N^  we  need 
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to  operate  at.  Since  the  SNR  satisfies  SNR  =  we  have 

SNR,.^^=(§)  §  (6.78) 

reqd  ^ 

(c)  Given  the  receiver  noise  figure  F  (dB),  we  can  infer  the  noise  power  =  NqB  = 
and  hence  the  minimum  required  received  signal  power  is  given  by 


FRx(min)  =  SNRreqdPn  =  RbNo,nomlO^^^^  (6.79) 

\^0/  reqd  ^  \^oJ  reqd 

This  is  called  the  required  receiver  sensitivity,  and  is  usually  quoted  in  dBm,  as  PRx,dBm(min)  = 
10  logio  PRx(min)(mW).  Using  (5.93),  we  obtain  that 


^ijv,dBm(min)  =  (  ^  )  +  10  log^o  Rf,  -  174  +  F  (6.80) 

V^''0/  reqd,dB 

where  Rf,  is  in  bits  per  second.  Note  that  dependence  on  bandwidth  B  (and  hence  on  excess 
bandwidth)  cancels  out  in  (6.79),  so  that  the  final  expression  for  receiver  sensitivity  depends 
only  on  the  required  Fft/iVo  (which  depends  on  the  signaling  scheme  and  target  BER),  the  bit 
rate  Rf,,  and  the  noise  figure  F. 

Once  we  know  the  receiver  sensitivity,  we  need  to  determine  the  link  parameters  (e.g.,  transmitted 
power,  choice  of  antennas,  range)  such  that  the  receiver  actually  gets  at  least  that  much  power, 
plus  a  link  margin  (typically  expressed  in  dB).  We  illustrate  such  considerations  via  the  Friis 
formula  for  propagation  loss  in  free  space,  which  we  can  think  of  as  modeling  a  line-of-sight 
wireless  link.  While  deriving  this  formula  from  basic  electromagnetics  is  beyond  our  scope  here, 
let  us  provide  some  intuition  before  stating  it. 

Suppose  that  a  transmitter  emits  power  Ptx  that  radiates  uniformly  in  all  directions.  The  power 
per  unit  area  at  a  distance  R  from  the  transmitter  is  where  we  have  divided  by  the  area 
of  a  sphere  of  radius  R.  The  receive  antenna  may  be  thought  of  as  providing  an  effective  area, 
termed  the  antenna  aperture,  for  catching  a  portion  of  this  power.  (The  aperture  of  an  antenna 
is  related  to  its  size,  but  the  relation  is  not  usually  straightforward.)  If  we  denote  the  receive 
antenna  aperture  by  Ajix,  the  received  power  is  given  by 


Prx  = 


Ptx 

AttR^ 


Arx 


Now,  if  the  transmitter  can  direct  power  selectively  in  the  direction  of  the  receiver  rather  than 
radiating  it  isotropically,  we  get 


Prx  = 


Ptx 

47rF2 


GtxArx 


(6.81) 


where  Gtx  is  the  transmit  antenna’s  gain  towards  the  receiver,  relative  to  a  hypothetical  isotropic 
radiator.  We  now  have  a  formula  for  received  power  in  terms  of  transmitted  power,  which  depends 
on  the  gain  of  the  transmit  antenna  and  the  aperture  of  the  receive  antenna.  We  would  like  to 
express  this  formula  solely  in  terms  of  antenna  gains  or  antenna  apertures.  To  do  this,  we  need 
to  relate  the  gain  of  an  antenna  to  its  aperture.  To  this  end,  we  state  without  proof  that  the 
aperture  of  an  isotropic  antenna  is  given  hy  A  =  Since  the  gain  of  an  antenna  is  the  ratio 
of  its  aperture  to  that  of  an  isotropic  antenna.  This  implies  that  the  relation  between  gain  and 
aperture  can  be  written  as 


G 


A 

A7(47r) 


Ax  A 


(6.82) 
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Assuming  that  the  aperture  A  scales  up  in  some  fashion  with  antenna  size,  this  implies  that,  for 
a  hxed  form  factor,  we  can  get  higher  antenna  gains  as  we  decrease  the  carrier  wavelength,  or 
increase  the  carrier  frequency. 


Using  (6.82)  in  (6.81),  we  get  two  versions  of  the  Friis  formula: 


Priis  formula  for  free  space  propagation 


Prx  —  Ptx  Gtx  Grx 
Prx  =  Ptx 


A2 


AtxArx 


,  in  terms  of  antenna  gains 


A2/22 


,  in  terms  of  antenna  apertures 


(6.83) 

(6.84) 


where 


•  Gtx-,  Atx  are  the  gain  and  aperture,  respectively,  of  the  transmit  antenna, 

•  Grx,  Arx  are  the  gain  and  aperture,  respectively,  of  the  receive  antenna, 

•  A  =  ^  is  the  carrier  wavelength  (c  =  3  x  10®  meters/sec,  is  the  speed  of  light,  fc  the  carrier 
frequency) , 

•  R  is  the  range  (line-of-sight  distance  between  transmitter  and  receiver). 


The  hrst  version  (6.83)  of  the  Friis  formula  tells  us  that,  for  antennas  with  fixed  gain,  we  should 
try  to  use  as  low  a  carrier  frequency  (as  large  a  wavelength)  as  possible.  On  the  other  hand, 
the  second  version  tells  us  that,  if  we  have  antennas  of  a  given  form  factor,  then  we  can  get 
better  performance  as  we  increase  the  carrier  frequency  (decrease  the  wavelength),  assuming  of 
course  that  we  can  “point”  these  antennas  accurately  at  each  other.  Of  course,  higher  carrier 
frequencies  also  have  the  disadvantage  of  incurring  more  attenuation  from  impairments  such  as 
obstacles,  rain,  fog.  Some  of  these  tradeoffs  are  explored  in  the  problems. 


In  order  to  apply  the  Friis  formula  (let  us  focus  on  version  (6.83)  for  concreteness)  to  link  budget 
analysis,  it  is  often  convenient  to  take  logarithms,  converting  the  multiplications  into  addition. 
On  a  logarithmic  scale,  antenna  gains  are  expressed  in  dBi,  where  GdBi  =  lOlog^oG  for  an 
antenna  with  raw  gain  G.  Expressing  powers  in  dBm,  we  have 


Prx, dBm  —  Ptx, dBm  +  Gtx, dBi  +  Grx, dBi  +  10  log;^o  iq,j^2j^2 
More  generally,  we  have  the  link  budget  equation 

Bijx.dBm  =  .Prx, dBm  +  GTX,dBi  +  Giix,dBi  “  Lpathloss,dB{R) 


(6.85) 


(6.86) 


where  Lpathioss,dB{.R)  is  the  path  loss  in  dB.  For  free  space  propagation,  we  have  from  the  Friis 
formula  (6.85)  that 


L 


pathloss^d'Q 


(P) 


10  logio 


1Qti‘^R? 

A2 


path  loss  in  dB  for  free  space  propagation 


(6.87) 


While  the  Friis  formula  is  our  starting  point,  the  link  budget  equation  (6.86)  applies  more  gen¬ 
erally,  in  that  we  can  substitute  other  expressions  for  path  loss,  depending  on  the  propagation 
environment.  For  example,  for  wireless  communication  in  a  cluttered  environment,  the  signal 
power  may  decay  as  rather  than  the  free  space  decay  of  A  mixture  of  empirical  mea¬ 
surements  and  statistical  modeling  is  typically  used  to  characterize  path  loss  as  a  function  of 
range  for  the  environments  of  interest.  For  example,  the  design  of  wireless  cellular  systems  is 
accompanied  by  extensive  “measurement  campaigns”  and  modeling.  Once  we  decide  on  the  path 
loss  formula  {Lpathioss,dB{,R))  to  be  used  in  the  design,  the  transmit  power  required  to  attain  a 
given  receiver  sensitivity  can  be  determined  as  a  function  of  range  R.  Such  a  path  loss  formula 
typically  characterizes  an  “average”  operating  environment,  around  which  there  might  be  sig- 
nihcant  statistical  variations  that  are  not  captured  by  the  model  used  to  arrive  at  the  receiver 
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sensitivity  For  example,  the  receiver  sensitivity  for  a  wireless  link  may  be  calcnlated  based  on  the 
AWGN  channel  model,  whereas  the  link  may  exhibit  rapid  amplitnde  variations  dne  to  mnltipath 
fading,  and  slower  variations  dne  to  shadowing  (e.g.,  dne  to  bnildings  and  other  obstacles).  Even 
if  fading/shadowing  effects  are  factored  into  the  channel  model  nsed  to  compnte  BER,  and  the 
model  for  path  loss,  the  actnal  environment  enconntered  may  be  worse  than  that  assnmed  in 
the  model.  In  general,  therefore,  we  add  a  link  margin  Lmargin,dB,  again  expressed  in  dB,  in  an 
attempt  to  bndget  for  potential  performance  losses  dne  to  nnmodeled  or  nnforeseen  impairments. 
The  size  of  the  link  margin  depends,  of  conrse,  on  the  conhdence  of  the  system  designer  in  the 
models  nsed  to  arrive  at  the  rest  of  the  link  bndget. 

Pntting  all  this  together,  if  PRx,dBm(Kiin)  is  the  desired  receiver  sensitivity  (i.e.,  the  minimnm 
reqnired  received  power),  then  we  compnte  the  transmit  power  for  the  link  to  be 

Required  transmit  power 

-PrX,dBm  =  -PRV:,dBm  (niin)  —  GtX, dBi  —  Giix,dBi  +  Lpathloss, dB^R)  +  -f"margm,dB  (6.88) 

Let  ns  illnstrate  these  concepts  nsing  some  examples. 


Example  6.5.1  Consider  again  the  5  GHz  WLAN  link  of  Example  5.8.1.  We  wish  to  ntilize  a 
20  MHz  channel,  nsing  Gray  coded  QPSK  and  an  excess  bandwidth  of  33  %.  The  receiver  has 
a  noise  hgnre  of  6  dB. 

(a)  What  is  the  bit  rate? 

(b)  What  is  the  receiver  sensitivity  reqnired  to  achieve  a  BER  of  10“®? 

(c)  Assnming  transmit  and  receive  antenna  gains  of  2  dBi  each,  what  is  the  range  achieved  for 
100  mW  transmit  power,  nsing  a  link  margin  of  20  dB?  Use  link  bndget  analysis  based  on  free 
space  path  loss. 

Solution  (a)  For  bandwidth  B  and  fractional  excess  bandwidth  a,  the  symbol  rate 


Rs  =  -  =  — ^ 
T  1  +  a 


20 


=  15  Msymbols/sec 


1  +  0.33 

and  the  bit  rate  for  an  M-ary  constellation  is 

Rb  =  Rsloga  M  =  15  Msymbols/sec  x  2  bits/symbol  =  30  Mbits/sec 

(b)  BER  for  QPSK  with  Gray  coding  is  Q  ^  desired  BER  of  10“®,  we  obtain  that 

10.2.  Plngging  in  Rb  =  30  Mbps  and  F  =  6  dB  in  (6.80),  we  obtain  that  the 


reqd,db 


reqnired  receiver  sensitivity  is  PRx,dBm(niin)  =  —83  dBm. 

(c)  The  transmit  power  is  100  mW,  or  20  dBm.  Rewriting  (6.88),  the  allowed  path  loss  to  attain 
the  desired  sensitivity  at  the  desired  link  margin  is 


Lpathloss,dB{R)  —  PtX, dBm 

=  20  -(-83) +  2  +  2  - 


■  -PRX,dBm(min)  +  Grx,dBi  +  GRx,dB\ 

20  =  87  dB 


(6.89) 


We  can  now  invert  the  formnla  for  free  space  loss,  (6.87),  noting  that  /c  =  5  GHz,  which  implies 
X  =  -j-  =  0.06  m.  We  get  a  range  R  of  107  meters,  which  is  of  the  order  of  the  advertised  ranges 
for  WLANs  nnder  nominal  operating  conditions.  The  range  decreases,  of  conrse,  for  higher  bit 
rates  nsing  larger  constellations.  What  happens,  for  example,  when  we  nse  16QAM  or  64QAM? 


Example  6.5.2  Consider  an  indoor  link  at  10  meters  range  nsing  nnlicensed  spectrnm  at  60 
GHz.  Snppose  that  the  transmitter  and  receiver  each  nse  antennas  with  horizontal  beamwidths 
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of  60o  and  vertical  beamwidths  of  30°.  Use  the  following  approximation  to  calculate  the  resulting 
antenna  gains: 


G 


41000 

^horiz^vert 


where  G  denotes  the  antenna  gain  (linear  scale),  Ohoriz  and  O^ert  denote  horizontal  and  vertical 
beamwidths  (in  degrees).  Set  the  noise  hgure  to  8  dB,  and  assume  a  link  margin  of  10  dB  at 
BER  of  10-®. 

(a)  Calculate  the  bandwidth  and  transmit  power  required  for  a  2  Gbps  link  using  Gray  coded 
QPSK  and  50%  excess  bandwidth. 

(b)  How  do  your  answers  change  if  you  change  the  signaling  scheme  to  Gray  coded  16QAM, 
keeping  the  same  hit  rate  as  in  (a)? 

(c)  If  you  now  employ  Gray  coded  16QAM  keeping  the  same  symbol  rate  as  in  (a),  what  is  the 
bit  rate  attained  and  the  transmit  power  required? 

(d)  How  do  the  answers  in  the  setting  of  (a)  change  if  you  increase  the  horizontal  beamwidth  to 
120°,  keeping  all  other  parameters  hxed? 

Solution:  (a)  A  2  Gbps  link  using  QPSK  corresponds  to  a  symbol  rate  of  1  Gsymbols/sec. 
Factoring  in  the  50%  excess  bandwidth,  the  required  bandwidth  is  R  =  1.5  GHz.  The  target 
BER  and  constellation  are  as  in  the  previous  example,  hence  we  still  have  {Eb/No)reqd,dB  ~  10.2 
dB.  Plugging  in  Rb  =  2  Gbps  and  F  =  8  dB  in  (6.80),  we  obtain  that  the  required  receiver 
sensitivity  is  PRx,dBm(niin)  =  —62.8  dBm. 

The  antenna  gains  at  each  end  are  given  by 


41000 
60  X  30 


22.78 


Gonverting  to  dB  scale,  we  obtain  Gxx.dBi  =  Grx^Bi  =  13.58  dBi. 

The  transmit  power  for  a  range  of  10  m  can  now  be  obtained  using  (6.88)  to  be  8.1  dBm. 

(b)  For  the  same  bit  rate  of  2  Gbps,  the  symbol  rate  for  16QAM  is  0.5  Gsymbols/sec,  so  that 
the  bandwidth  required  is  0.75  GHz,  factoring  in  50%  excess  bandwidth.  The  nearest  neighbors 


approximation  to  BER  for  Gray  coded  16QAM  is  given  by  Q  j .  Using  this,  we  hnd  that 

a  target  BER  of  10“®  requires  {Eb/No)reqd,dB  ~  14.54  dB,  and  increase  of  4.34  dB  relative  to  (a). 
This  leads  to  a  corresponding  increase  in  the  receiver  sensitivity  to  -58.45  dBm,  which  leads  to 
the  required  transmit  power  increasing  to  12.4  dBm. 

(c)  If  we  keep  the  symbol  rate  hxed  at  1  Gsymbols/sec,  the  bit  rate  with  16QAM  is  =  4  Gbps. 
As  in  (b),  {Eb/No)reqd,dB  ~  14.54  dB.  The  receiver  sensitivity  is  therefore  given  by  -55.45  dBm, 
a  3  dB  increase  over  (b),  corresponding  to  the  doubling  of  the  bit  rate.  This  translates  directly 
to  a  3  dB  increase,  relative  to  (b),  in  transmit  power  to  15.4  dBm,  since  the  path  loss,  antenna 
gains,  and  link  margin  are  as  in  (b). 

(d)  We  now  go  back  to  the  setting  of  (a),  but  with  different  antenna  gains.  The  bandwidth  is, 
of  course,  unchanged  from  (a).  The  new  antenna  gains  are  3  dB  smaller  because  of  the  doubling 
of  horizontal  beamwidth.  The  receiver  sensitivity,  path  loss  and  link  margin  are  as  in  (a),  thus 
the  3  dB  reduction  in  antenna  gains  at  each  end  must  be  compensated  for  by  a  6  dB  increase  in 
transmit  power  relative  to  (a).  Thus,  the  required  transmit  power  is  14.1  dBm. 


Discussion:  The  parameter  choices  in  the  preceding  examples  illustrate  how  physical  character¬ 
istics  of  the  medium  change  with  choice  of  carrier  frequency,  and  affect  system  design  tradeoffs. 
The  5  GHz  system  in  Example  6.5.1  employs  essentially  omnidirectional  antennas  with  small 
gains  of  2  dBi,  whereas  it  is  possible  to  realize  highly  directional  yet  small  antennas  (e.g.,  using 
electronically  steerable  printed  circuit  antenna  arrays)  for  the  60  GHz  system  in  Example  6.5.2 
by  virtue  of  the  small  (5  mm)  wavelength.  60  GHz  waves  are  easily  blocked  by  walls,  hence  the 
range  in  Example  6.5.2  corresponds  to  in-room  communication.  We  have  also  chosen  parameters 
such  that  the  transmit  power  required  for  60  GHz  is  smaller  than  that  at  5  GHz,  since  it  is 
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more  difficult  to  produce  power  at  higher  radio  frequencies.  Finally,  the  link  margin  for  5  GHz 
is  chosen  higher  than  for  60  GHz:  propagation  at  60  GHz  is  near  line-of-sight,  whereas  fading 
due  to  multipath  propagation  at  5  GHz  can  be  more  signihcant,  and  hence  may  require  a  higher 
link  margin  relative  to  the  AWGN  benchmark  which  provides  the  basis  for  our  link  budget. 


6.6  Concept  Inventory 


This  chapter  establishes  a  systematic  hypothesis  testing  based  framework  for  demodulation,  de¬ 
velops  tools  for  performance  evaluation  which  enable  exploration  of  the  power-bandwidth  trade¬ 
offs  exhibited  different  signaling  schemes,  and  relates  these  mathematical  models  to  physical  link 
parameters  via  the  link  budget.  A  summary  of  some  key  concepts  and  results  is  as  follows. 

Hypothesis  testing 

•  The  probability  of  error  is  minimized  by  choosing  the  hypothesis  with  the  maximum  a  posteriori 
probability  (i.e.,  the  hypothesis  that  is  most  likely  conditioned  on  the  observation).  That  is,  the 
MPE  rule  is  also  the  MAP  rule: 


5MPE{y)  =  SMAp{y)  =  arg  maxi<i<jy^  P[Hi\Y  =  y] 

=  arg  maxi<i<^  T^iPiyli)  =  arg  logTr*  -F  \ogp{y\i) 

For  equal  priors,  the  MPE  rule  coincides  with  the  ML  rule: 


Sniiy)  =  arg  maxi<i<^  p{y\i)  =  arg  maxi<i<^  hgp{y\i) 


•  For  binary  hypothesis  testing,  ML  and  MPE  rules  can  be  written  as  likelihood,  or  log  likelihood, 
ratio  tests: 


Hy)  = 


L{y) 


viiy) 

Po{y) 


Hi 

H 

pi{y) 

po{y) 

>  1 
< 

or  log  L{y) 

> 

< 

Ho 

H, 

Hi 

Hi 

>  TTo 

—  or 

log  L{y)  J 

TTO 

<  TTi 

< 

TTl 

Ho 

Ho 

ML  rule 


MPE/MAP  rule 


Geometric  view  of  signals 

Continuous-time  signals  can  be  interpreted  as  vectors  in  Euclidean  space,  with  inner  product 
(51,52)  =  /  si{t)sl{t)  dt,  norm  ||s||  =  and  energy  ||s|p  =  (s,s).  Two  signals  are 

orthogonal  if  their  inner  product  is  zero. 

Geometric  view  of  WGN 

•  WGN  n{t)  with  PSD  when  projected  in  any  “direction”  (i.e.,  correlated  against  any  unit 
energy  signal),  yields  an  A(0,(T^)  random  variable.  •  More  generally,  projections  of  the  noise 
along  any  signals  are  jointly  Gaussian,  with  zero  mean  and  cov  {{n,u),  {n,v))  =  a^{v,u). 

•  Noise  projections  along  orthogonal  signals  are  uncorrelated.  Since  they  are  jointly  Gaussian, 
they  are  also  independent. 

Sigual  space 

•  M-ary  signaling  in  AWGN  in  continuous  time  can  be  reduced,  without  loss  of  information, 
to  M-ary  signaling  in  hnite-dimensional  vector  space  with  each  dimension  seeing  i.i.d.  N{0,a^) 
noise,  which  corresponds  to  discrete  time  WGN.  This  is  accomplished  by  projecting  the  received 
signal  onto  the  signal  space  spanned  by  the  M  possible  signals. 
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•  Decision  rules  derived  using  hypothesis  testing  in  the  hnite-dimensional  signal  space  map 
directly  back  to  continuous  time  because  of  two  key  reasons:  signal  inner  products  are  preserved, 
and  the  noise  component  orthogonal  to  the  signal  space  is  irrelevant.  Because  of  this  equivalence, 
we  can  stop  making  a  distinction  between  continuous  time  signals  and  hnite-dimensional  vector 
signals  in  our  notation. 

Optimal  demodulation 

•  For  the  model  Hi  =  y  =  Si  +  n,Q  <i  <  M  —  1^  optimum  demodulation  involve  computation  of 
the  correlator  outputs  Zi  =  {y,  Si).  This  can  be  accomplished  by  using  a  bank  of  correlators  or 
matched  hlters,  but  any  other  other  receiver  structure  that  yields  the  statistics  {Z^}  would  also 
preserve  all  of  the  relevant  information. 

•  The  ML  and  MPE  rules  are  given  by 

||S.||2 

SML{y)  =  arg  maxo<i<j\^_i  {y,Si) - ^ 

^MPEiy)  =  arg  maXo<^<j\^_i  {y,  Si) - logTr^ 

When  the  received  signal  lies  in  a  hnite-dimensional  space  in  which  the  noise  has  hnite  energy, 
the  ML  rule  can  be  written  as  a  minimum  distance  rule  (and  the  MPE  rule  as  a  variant  thereof) 
as  follows: 

^ML(|/)arg  mino<i<M-i  Wv  - 

^MPEiy)  =  arg  mino<i<j\^_i  \\y  -  Si|p  -  2aMog7ri 

Geometry  of  ML  rule:  ML  decision  boundaries  are  formed  from  hyperplanes  that  bisect  lines 
connecting  signal  points. 

Performance  analysis 

•  For  binary  signaling,  the  error  probability  for  the  ML  rule  is  given  by 


where  d  =  ||si  —  so||  is  the  Euclidean  distance  between  the  signals.  The  performance  therefore 
depends  on  the  power  efhciency  rjp  =  ^  and  the  SNR  Eb/No.  Since  the  power  efficiency  is  scale- 

-^b 

invariant,  we  may  choose  any  convenient  scaling  when  computing  it  for  a  given  constellation. 

•  For  M-ary  signaling,  closed  form  expressions  for  the  error  probability  may  not  be  available, 
but  we  know  that  the  performance  depends  only  on  the  scale-invariant  inner  products 

which  depend  on  the  constellation  “shape”  alone,  and  on  Eb/No. 

•  The  conditional  error  probabilities  for  M-ary  signaling  can  be  bounded  using  the  union  bound 
(these  can  then  be  averaged  to  obtain  an  upper  bound  on  the  average  error  probability): 


where  dij  =  ||sj  —  Sj||  are  the  pairwise  distances  between  signal  points. 

•  When  we  understand  the  shape  of  the  decision  regions,  we  can  tighten  the  union  bound  into 
an  intelligent  union  bound: 


Pe\i  <  ^  Q 

j  G  Nml  (*) 
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where  Nmi{i)  denotes  the  set  of  neighbors  of  s*  which  dehne  the  decision  region  Fj. 
•  For  regular  constellations,  the  nearest  neighbors  approximation  is  given  by 


Pe^Nd  Q 

^  ^min  ^ 


Nd  .  Q 


d? 

with  r]p  =  providing  a  measure  of  power  efficiency  which  can  be  used  to  compare  across 

-^b 

constellations. 

•  If  Gray  coding  is  possible,  the  bit  error  probability  can  be  estimated  as 


P(bit  error)  Q 


ypEb 

2No 


Link  budget:  This  relates  (e.g.,  using  the  Friis  formula  for  free  space  propagation)  the  per¬ 
formance  of  a  communication  link  to  physical  parameters  such  as  transmit  power,  transmit  and 
receive  antenna  gains,  range,  and  receiver  noise  hgure.  A  link  margin  is  typically  introduced  to 
account  for  unmodeled  impairments. 


6.7  Endnotes 


The  geometric  signal  space  approach  for  deriving  and  analyzing  is  now  standard  in  textbooks 
on  communication  theory,  such  as  [7,  8].  It  was  hrst  developed  by  Russian  pioneer  Vladimir 
Kotelnikov  [29],  and  presented  in  a  cohesive  fashion  in  the  classic  textbook  by  Wozencraft  and 
Jacobs  [9]. 

A  number  of  details  of  receiver  design  have  been  swept  under  the  rug  in  this  chapter.  Our 
model  for  the  received  signal  is  that  it  equals  the  transmitted  signal  plus  WGN.  In  practice, 
the  transmitted  signal  can  be  signihcantly  distorted  by  the  channel  (e.g.,  scaling,  delay,  multi- 
path  propagation).  However,  the  basic  M-ary  signaling  model  is  still  preserved:  if  M  possible 
signals  are  sent,  then,  prior  to  the  addition  of  noise,  M  possible  signals  are  received  after  the 
deterministic  (but  a  priori  unknown)  transformations  due  to  channel  impairments.  The  receiver 
can  therefore  estimate  noiseless  copies  of  the  latter  and  then  apply  the  optimum  demodula¬ 
tion  techniques  developed  here.  This  approach  leads,  for  example,  to  the  optimal  equalization 
strategies  developed  by  Forney  [30]  and  Ungerboeck  [31];  see  Chapter  5  of  [7]  for  a  textbook 
exposition.  Estimation  of  the  noiseless  received  signals  involves  tasks  such  as  carrier  phase  and 
frequency  synchronization,  timing  synchronization,  and  estimation  of  the  channel  impulse  re¬ 
sponse  or  transfer  function.  In  modern  digital  communication  transceivers,  these  operations 
are  typically  all  performed  using  DSP  on  the  complex  baseband  received  signal.  Perhaps  the 
best  approach  for  exploring  further  is  to  acquire  a  basic  understanding  of  the  relevant  estima¬ 
tion  techniques,  and  to  then  go  to  technical  papers  of  specihc  interest  (e.g.,  IEEE  conference 
and  journal  publications).  Classic  texts  covering  estimation  theory  include  Kay  [32],  Poor  [33] 
and  Van  Trees  [34].  Several  graduate  texts  in  communications  contain  a  brief  discussion  of  the 
modern  estimation-theoretic  approach  to  synchronization  that  may  provide  a  helpful  orientation 
prior  to  going  to  the  research  literature;  for  example,  see  [7]  (Chapter  4)  and  [11,  35]  (Chapter 


8). 
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Problems 


Hypothesis  Testing 


Problem  6.1  The  received  signal  in  a  digital  communication  system  is  given  by 

f  s(t)  +  n(t)  1  sent 

=  i  «(()  0  sent 


where  n  is  AWGN  with  PSD  =  Nq/2  and  s{t)  is  as  shown  below.  The  received  signal  is  passed 


Figure  6.30:  Set-up  for  Problem  6.1 


through  a  filter,  and  the  output  is  sampled  to  yield  a  decision  statistic.  An  ML  decision  rule  is 
employed  based  on  the  decision  statistic.  The  set-up  is  shown  in  Figure  6.30. 

(a)  For  h{t)  =  s(— t),  find  the  error  probability  as  a  function  of  Ei,/Nq  if  to  =  1. 

(b)  Can  the  error  probability  in  (a)  be  improved  by  choosing  the  sampling  time  to  differently? 

(c)  Now,  hnd  the  error  probability  as  a  function  of  Eb/No  for  h(t)  =  I[o,2]  and  the  best  possible 
choice  of  sampling  time. 

(d)  Finally,  comment  on  whether  you  can  improve  the  performance  in  (c)  by  using  a  linear  com¬ 
bination  of  two  samples  as  a  decision  statistic,  rather  than  just  using  one  sample. 


Problem  6.2  Consider  binary  hypothesis  testing  based  on  the  decision  statistic  P,  where  Y  ~ 
N{2,9)  under  Hi  and  Y  ~  A^(— 2,4)  under  Hq. 

(a)  Show  that  the  optimal  (ML  or  MPE)  decision  rule  is  equivalent  to  comparing  a  function  of 
the  form  +  by  to  a.  threshold. 

(b)  Specify  the  MPE  rule  explicitly  (i.e.,  specify  a,  b  and  the  threshold)  when  tiq  =  \- 

(c)  Express  the  conditional  error  probability  Pe|o  for  fhe  decision  rule  in  (b)  in  terms  of  the  Q 
function  with  positive  arguments.  Also  provide  a  numerical  value  for  this  probability. 

Problem  6.3  Find  and  sketch  the  decision  regions  for  a  binary  hypothesis  testing  problem  with 
observation  Z,  where  the  hypotheses  are  equally  likely,  and  the  conditional  distributions  are 
given  by 

Ho'-  Z  is  uniform  over  [—2,  2] 

Hi'.  Z  is  Caussian  with  mean  0  and  variance  1. 


Problem  6.4  The  receiver  in  a  binary  communication  system  employs  a  decision  statistic  Z 
which  behaves  as  follows: 

Z  =  A^  if  0  is  sent 
Z  =  4  -|-  A^  if  1  is  sent 

where  N  is  modeled  as  Laplacian  with  density 


Pn{,x) 


—  OO  <  X  <  oo 
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Note:  Parts  (a)  and  (b)  can  be  done  independently. 

(a)  Find  and  sketch,  as  a  fnnction  of  .s,  the  log  likelihood  ratio 

A»=logLW  =  loggjh 

where  p{z\i)  denotes  the  conditional  density  of  Z  given  that  i  is  sent  {i  =  0, 1). 

(b)  Find  Pe\i,  the  conditional  error  probability  given  that  1  is  sent,  for  the  decision  rule 


6{z) 


0,  z  <  1 

1,  2  >  1 


(c)  Is  the  rule  in  (b)  the  MPE  rule  for  any  choice  of  prior  probabilities?  If  so,  specify  the  prior 
probability  ttq  =  P[  0  sent]  for  which  it  is  the  MPE  rule.  If  not,  say  why  not. 


Problem  6.5  Consider  the  MAP/MPE  rule  for  the  hypothesis  testing  problem  in  Example 

6.1.1. 

(a)  Show  that  the  MAP  rule  always  says  Hi  if  the  prior  probability  of  Hq  is  smaller  than  some 
positive  threshold.  Specify  this  threshold. 

(b)  Compute  and  plot  the  conditional  probabilities  Pe|o  and  Pep,  and  the  average  error  proba¬ 
bility  Pe,  versus  ttq  as  the  latter  varies  in  [0, 1]. 

(c)  Discuss  any  trends  that  you  see  from  the  plots  in  (b). 


Problem  6.6  Consider  a  MAP  receiver  for  the  basic  Gaussian  example,  as  discussed  in  Example 
6.1.2.  Fix  SNR  at  13  dB.  We  wish  to  explore  the  effect  of  prior  mismatch,  by  quantifying  the 
performance  degradation  of  a  MAP  receiver  if  the  actual  priors  are  different  from  the  priors  for 
which  it  has  been  designed. 

(a)  Plot  the  average  error  probability  for  a  MAP  receiver  designed  for  ttq  =  0.2,  as  ttq  varies 
from  0  to  1.  As  usual,  use  a  log  scale  for  the  probabilities.  On  the  same  plot,  also  plot  the  error 
probability  of  the  ML  receiver  as  a  benchmark. 

(b)  From  the  plot  in  (a),  comment  on  how  much  error  you  can  tolerate  in  the  prior  probabilities 
before  the  performance  of  the  MAP  receiver  designed  for  the  given  prior  becomes  unacceptable. 

(c)  Repeat  (a)  and  (b)  for  a  MAP  receiver  designed  for  itq  =  0.4.  Is  the  performance  more  or 
less  sensitive  to  errors  in  the  priors? 

Problem  6.7  Consider  binary  hypothesis  testing  in  which  the  observation  Y  is  modeled  as  uni¬ 
formly  distributed  over  [—2,  2]  under  Hq,  and  has  conditional  density  p(|/|l)  =  c{l  —  \y\/3)I[-3^3]{y) 
under  Hi,  where  c  >  0  is  a  constant  to  be  determined. 

(a)  Find  c. 

(b)  Find  and  sketch  the  decision  regions  Fq  and  Fi  corresponding  to  the  ML  decision  rule. 

(c)  Find  the  conditional  error  probabilities. 

Problem  6.8  Consider  binary  hypothesis  testing  with  scalar  observation  Y.  Under  hypothesis 
Ho,  Y  is  modeled  as  uniformly  distributed  over  [—5,5].  Under  Hi,  Y  has  conditional  density 
p{y\l)  =  —  oo  <  y  <  oo. 

(a)  Specify  the  ML  rule  and  clearly  draw  the  decision  regions  Fq  and  Fi  on  the  real  line. 

(b)  Find  the  conditional  probabilities  of  error  for  the  ML  rule  under  each  hypothesis. 

Problem  6.9  For  the  setting  of  Problem  6.8,  suppose  that  the  prior  probability  of  Hq  is  1/3. 

(a)  Specify  the  MPE  rule  and  draw  the  decision  regions. 

(b)  Find  the  conditional  error  probabilities  and  the  average  error  probability.  Compare  with  the 
corresponding  quantities  for  the  ML  rule  considered  in  Problem  6.8. 
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Problem  6.10  The  receiver  output  Z  in  an  on-off  keyed  optical  communication  system  is  mod¬ 
eled  as  a  Poisson  random  variable  with  mean  mo  =  1  if  0  is  sent,  and  mean  mi  =  10  if  1  is  sent. 

(a)  Show  that  the  ML  rule  consists  of  comparing  Z  to  a  threshold,  and  specify  the  numerical 
value  of  the  threshold.  Note  that  Z  can  only  take  nonnegative  integer  values. 

(b)  Compute  the  conditional  error  probabilities  for  the  ML  rule  (compute  numerical  values  in 
addition  to  deriving  formulas). 

(c)  Find  the  MPE  rule  if  the  prior  probability  of  sending  1  is  0.1. 

(d)  Compute  the  average  error  probability  for  the  MPE  rule. 


Problem  6.11  The  received  sample  Y  in  a  binary  communication  system  is  modeled  as  follows: 
Y  =  yl  -f-  if  0  is  sent,  and  Y  =  —A  -1-  iV  if  1  is  sent,  where  N  is  Laplacian  noise  with  density 

Pn{x)  =  5  —  oo  <  X  <  oo 


(a)  Find  the  ML  decision  rule.  Simplify  as  much  as  possible. 

(b)  Find  the  conditional  error  probabilities  for  the  ML  rule. 

(c)  Now,  suppose  that  the  prior  probability  of  sending  0  is  1/3.  Find  the  MPE  rule,  simplifying 
as  much  as  possible. 

(d)  In  the  setting  of  (c),  hnd  the  LLR  log  . 


Problem  6.12  Consider  binary  hypothesis  testing  with  scalar  observation  Y.  Under  hypothesis 
Ho,  Y  is  modeled  as  an  exponential  random  variable  with  mean  5.  Under  hypothesis  Hi,  Y  is 
modeled  as  uniformly  distributed  over  the  interval  [0, 10]. 

(a)  Specify  the  ML  rule  and  clearly  draw  the  decision  regions  Fq  and  Fi  on  the  real  line. 

(b)  Find  the  conditional  probability  of  error  for  the  ML  rule,  given  that  Hq  is  true. 

(c)  Suppose  that  the  prior  probability  of  Hq  is  1/3.  Compute  the  posterior  probability  of  Hq 
given  that  we  observe  Y  =  4  (i.e.,  hnd  P[Hq\Y  =  4]). 


Problem  6.13  Consider  hypothesis  testing  in  which  the  observation  Y  is  given  by  the  following 
model: 

Hi:Y  =  6  + N 
Hq-.Y  =  N 

where  the  noise  N  has  density  pn{x)  =  ^  ^1  —  J[_io,io](a^)- 

(a)  Find  the  conditional  error  probability  given  Hi  for  the  following  decision  rule: 


Hi 

> 

< 

Hq 


(b)  Are  there  a  set  of  prior  probabilities  for  which  the  decision  rule  in  (a)  minimizes  the  error 
probability?  If  so,  specify  them.  If  not,  say  why  not. 


Receiver  design  and  performance  analysis  for  the  AWGN  channel 

Problem  6.14  Consider  binary  signaling  in  AWGN,  with  Si(t)  =  (1  —  and  So(t)  = 

—si(t).  The  received  signal  is  given  by  y(t)  =  Si(t)  +  n(t),  i  =  0, 1,,  where  the  noise  n  has  PSD 
^2  _  ^  _  g  Pqj,  error  probabilities  computed  in  this  problem,  specify  in  terms  of 


332 


the  Q  function  with  positive  arguments  and  also  give  numerical  values. 

(a)  How  would  you  implement  the  ML  receiver  using  the  received  signal  y{t)7  What  is  its 
conditional  error  probability  given  that  Sq  is  sent? 

Now,  consider  a  suboptimal  receiver,  where  the  receiver  generates  the  following  decision  statistics: 

/-0.5  ^0  ^0.5  pi 

y{t)dt,  yi=  y{t)dt,  2/2  =  /  y{t)dt,  yo=  y{t)dt 
■1  J -O.b  Jo  J0.5 

(b)  Specify  the  conditional  distribution  of  y  =  (2/0, 2/i,  2/2, 2/3)^,  conditioned  on  sq  being  sent. 

(c)  Specify  the  ML  rule  when  the  observation  is  y.  What  is  its  conditional  error  probability 
given  that  sq  is  sent? 

(d)  Specify  the  ML  rule  when  the  observation  is  2/0  +  2/i  +  2/2  +  2/3-  What  is  its  conditional  error 
probability,  given  that  Sq  is  sent? 

(e)  Among  the  error  probabilities  in  (a),  (c)  and  (d),  which  is  the  smallest?  Which  is  the  biggest? 
Could  you  have  rank  ordered  these  error  probabilities  without  actually  computing  them? 

Problem  6.15  The  received  signal  in  an  on-off  keyed  digital  communication  system  is  given  by 

f  s(t)  +  n(t)  1  sent 

S''*'  =  I  n(t)  0  sent 

where  n  is  AWGN  with  PSD  =  No/2,  and  s{t)  =  A(l  — where  A  >  0.  The  received 
signal  is  passed  through  a  hlter  with  impulse  response  h{t)  =  /[o,i](t)  to  obtain  z{t)  =  {y  *  h)(t). 
Remark:  It  would  be  helpful  to  draw  a  picture  of  the  system  before  you  start  doing  the  calculations. 

(a)  Consider  the  decision  statistic  Z  =  2;(0)  -|-  z{l).  Specify  the  conditional  distribution  of  Z 
given  that  0  is  sent,  and  the  conditional  distribution  of  Z  given  that  1  is  sent. 

(b)  Assuming  that  the  receiver  must  make  its  decision  based  on  Z,  specify  the  ML  rule  and  its 
error  probability  in  terms  of  E^/No  (express  your  answer  in  terms  of  the  Q  function  with  positive 
arguments) . 

(c)  Find  the  error  probability  (in  terms  of  Ek/No)  for  ML  decisions  based  on  the  decision  statistic 
Z2  =  z{0)  +  z{0.5)  +  z{l). 


Si(t) 


1 


1  3 


t 


Figure  6.31:  Signal  Set  for  Problem  6.16. 


Problem  6.16  Consider  binary  signaling  in  AWGN  using  the  signals  depicted  in  Figure  6.31. 
The  received  signal  is  given  by 


Si(f)  +n{t), 
So{t)  +n{t), 


1  sent 
0  sent 
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where  n{t)  is  WGN  with  PSD  =  No/2. 

(a)  Show  that  the  ML  decision  rule  can  be  implemented  by  comparing  Z  =  J  y{t)a{t)dt  to  a 
threshold  7.  Sketch  a{t)  and  specify  the  corresponding  value  of  7. 

(b)  Specify  the  error  probability  of  the  ML  rule  as  a  function  of  Eb/No. 

(c)  Can  the  MPE  rule,  assuming  that  the  prior  probability  of  sending  0  is  1/3,  be  implemented 
using  the  same  receiver  structure  as  in  (a)?  What  would  need  to  change?  (Be  specific.) 

(d)  Consider  now  a  suboptimal  receiver  structure  in  which  y{t)  is  passed  through  a  hlter  with 
impulse  response  h{t)  =  /[04](t),  and  we  take  three  samples:  Zi  =  {y  *  h)(l),  Z2  =  {y  *  h)(2), 
Z3  =  {y  *  h)(3).  Specify  the  conditional  distribution  of  Z  =  {Zi,  Z2,  ^3)^  given  that  0  is  sent. 

(e)  (more  challenging)  Specify  the  ML  rule  based  on  Z  and  the  corresponding  error  probability 
as  a  function  of  Eb/No- 

Problem  6.17  Let  Piif)  =  /[o,i](t)  denote  a  rectangular  pulse  of  unit  duration.  Consider  two 
4-ary  signal  sets  as  follows: 

Signal  Set  A:  Si{t)  =  Pi{t  —  i),  i  =  0, 1,  2,  3. 

Signal  Set  B:  Soit)  =  pi{t)  +  Pi{t  -  3),  Si{t)  =  p^{t  -  1)  +  pi{t  -  2),  S2{t)  =  pi{t)  +  pi{t  -  2), 
s^/t)  =Pi{t-  1)  +pi{t-A). 

(a)  Find  signal  space  representations  for  each  signal  set  with  respect  to  the  orthonormal  basis 
{pi(t-i),i  =  0,1,  2,  3}. 

(b)  Find  union  bounds  on  the  average  error  probabilities  for  both  signal  sets  as  a  function  of 
Eb/No-  At  high  SNR,  what  is  the  penalty  in  dB  for  using  signal  set  B? 

(c)  Find  an  exact  expression  for  the  average  error  probability  for  signal  set  B  as  a  function  of 

Eb/No- 


Figure  6.32:  Signal  Set  for  Problem  6.18 


Problem  6.18  Consider  the  4-ary  signaling  set  shown  in  Figure  6.32,  to  be  used  over  an  AWCN 
channel. 

(a)  Find  a  union  bound,  as  a  function  of  Eb/No,  on  the  conditional  probability  of  error  given 
that  c{t)  is  sent. 

(b)  True  or  False  This  constellation  is  more  power  efficient  than  QPSK.  Justify  your  answer. 


Problem  6.19  Three  8-ary  signal  constellations  are  shown  in  Figure  6.33. 

(a)  Express  R  and  in  terms  of  so  that  all  three  constellations  have  the  same  Eb- 
(b)  For  a  given  Eb/No,  which  constellation  do  you  expect  to  have  the  smallest  bit  error  probability 
over  a  high  SNR  AWCN  channel? 
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•  • 


•  • 


•  • 


QAMl 


QAM2 


Figure  6.33:  Signal  constellations  for  Problem  6.19 


(c)  For  each  constellation,  determine  whether  you  can  label  signal  points  using  3  bits  so  that  the 
label  for  nearest  neighbors  differs  by  at  most  one  bit.  If  so,  hnd  such  a  labeling.  If  not,  say  why 
not  and  hnd  some  “good”  labeling. 

(d)  For  the  labelings  found  in  part  (c),  compute  nearest  neighbors  approximations  for  the  average 
bit  error  probability  as  a  function  of  Eh/Nn  for  each  constellation.  Evaluate  these  approximations 
for  Eb/No  =  15dB. 


Problem  6.20  Consider  the  signal  constellation  shown  in  Figure  6.34,  which  consists  of  two 
QPSK  constellations  of  different  radii,  offset  from  each  other  by  The  constellation  is  to  be 
used  to  communicate  over  a  passband  AWGN  channel. 


Figure  6.34:  Constellation  for  Problem  6.20 


(a)  Carefully  redraw  the  constellation  (roughly  to  scale,  to  the  extent  possible)  for  r  =  1  and 
R  =  \/2.  Sketch  the  ML  decision  regions. 

(b)  For  r  =  1  and  R  =  \/2,  hnd  an  intelligent  union  bound  for  the  conditional  error  probability, 
given  that  a  sign  al  point  from  the  inner  circle  is  sent,  as  a  function  of  Eb/NQ. 

(c)  How  would  you  choose  the  parameters  r  and  R  so  as  to  optimize  the  power  efficiency  of  the 
constellation  (at  high  SNR  )? 


Problem  6.21  (Exact  symbol  error  probabilities  for  rectangular  constellations)  As¬ 
suming  each  symbol  is  equally  likely,  derive  the  following  expressions  for  the  average  error  prob¬ 
ability  for  4PAM  and  16QAM: 
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,  symbol  error  probability  for  4PAM 


(6.90) 


Pe  = 


P. 


symbol  error  probability  for  16QAM  (6.91) 


(Assume  4PAM  with  equally  spaced  levels  symmetric  about  the  origin,  and  rectangular  16QAM 
equivalent  to  two  4PAM  constellations  independently  modulating  the  I  and  Q  components.) 


Q 

o 


♦ 


o 


Figure  6.35:  Constellation  for  Problem  6.22 


Problem  6.22  The  signal  constellation  shown  in  Figure  6.35  is  obtained  by  moving  the  outer 
corner  points  in  rectangular  16QAM  to  the  I  and  Q  axes. 

(a)  Sketch  the  ML  decision  regions. 

(b)  Is  the  constellation  more  or  less  power-efficient  than  rectangular  16QAM? 

Problem  6.23  Consider  a  16-ary  signal  constellation  with  4  signals  with  coordinates  (±1,  ±1), 
four  others  with  coordinates  (±3,  ±3),  and  two  each  having  coordinates  (±3,  0),  (±5,  0),  (0,  ±3), 
and  (0,±5),  respectively. 

(a)  Sketch  the  signal  constellation  and  indicate  the  ML  decision  regions. 

(b)  Find  an  intelligent  union  bound  on  the  average  symbol  error  probability  as  a  function  of 

Eb/No. 

(c)  Find  the  nearest  neighbors  approximation  to  the  average  symbol  error  probability  as  a  func¬ 
tion  of  Eb/No. 

(d)  Find  the  nearest  neighbors  approximation  to  the  average  symbol  error  probability  for  16QAM 
as  a  function  of  Eb/No. 

(e)  Comparing  (c)  and  (d)  (i.e.,  comparing  the  performance  at  high  SNR),  which  signal  set  is 
more  power  efficient? 


Problem  6.24  A  QPSK  demodulator  is  designed  to  put  out  an  erasure  when  the  decision  is 
ambivalent.  Thus,  the  decision  regions  are  modihed  as  shown  in  Figure  6.36,  where  the  cross- 
hatched  region  corresponds  to  an  erasure.  Set  a  =  ^,  where  0  <  a  <  1. 

(a)  Use  the  intelligent  union  bound  to  hnd  approximations  to  the  probability  p  of  symbol  error 
and  the  probability  q  of  symbol  erasure  in  terms  of  Eb/No  and  a. 

(b)  Find  exact  expressions  for  p  and  q  as  functions  of  Eb/No  and  a. 

(c)  Using  the  approximations  in  (a),  hnd  an  approximate  value  for  a  such  that  q  =  2p  for 
Eb/No  =  AdB. 

Remark:  The  motivation  for  (c)  is  that  a  typical  error-correcting  code  can  correct  twice  as  many 
erasures  as  errors. 
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Figure  6.36:  QPSK  with  erasures 


Figure  6.37:  Constellation  for  Problem  6.25 
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Problem  6.25  The  constellation  shown  in  Figure  6.37  consists  of  two  QPSK  constellations  lying 
on  concentric  circles,  with  inner  circle  of  radius  r  and  outer  circle  of  radius  R. 

(a)  For  r  =  1  and  i?  =  2,  redraw  the  constellation,  and  carefully  sketch  the  ML  decision  regions. 

(b)  Still  keeping  r  =  1  and  R  =  2,  hnd  an  intelligent  union  bound  for  the  symbol  error  probability 
as  a  function  of  Eb/No. 

(c)  For  r  =  1,  hnd  the  best  choice  of  R  in  terms  of  high  SNR  performance.  Compute  the  gain  in 
power  efficiency  (in  dB),  if  any,  over  the  setting  in  (a)-(b). 

Problem  6.26  Consider  the  constant  modulus  constellation  shown  in  Figure  6.38.  where  6  < 


(0,0), 


(0,1) 


.(1,0) 


’(1,1) 


Figure  6.38:  Signal  constellation  with  unequal  error  protection  (Problem  6.26). 

7r/4.  Each  symbol  is  labeled  by  2  bits  (61,62)  as  shown.  Assume  that  the  constellation  is  used 
over  a  complex  baseband  AWGN  channel  with  noise  Power  Spectral  Density  (PSD)  No/2  in  each 
dimension.  Let  (61,62)  denote  the  maximum  likelihood  (ML)  estimates  of  (61,62). 

(a)  Find  Pei  =  -P[6i  7^  61]  and  Pe2  =  P[b2  7^  62]  as  a  function  of  Eg/No,  where  Eg  denotes  the 
signal  energy. 

(b)  Assume  now  that  the  transmitter  is  being  heard  by  two  receivers,  PI  and  P2,  and  that  R2  is 
twice  as  far  away  from  the  transmitter  as  PI.  Assume  that  the  received  signal  energy  falls  off  as 
1/r^,  where  r  is  the  distance  from  the  transmitter,  and  that  the  noise  PSD  for  both  receivers  is 
identical.  Suppose  that  PI  can  demodulate  both  bits  61  and  62  with  error  probability  at  least  as 
good  as  10“^,  i.e.,  so  that  max{Pei(Pl),  Pe2(Pl)}  =  10“^.  Design  the  signal  constellation  (i.e., 
specify  9)  so  that  P2  can  demodulate  at  least  one  of  the  bits  with  the  same  error  probability, 
i.e.,  such  that  min{Pei(P2),  Pe2(P2)}  =  10“^. 

Remark:  You  have  designed  an  unequal  error  protection  scheme  in  which  the  receiver  that  sees 
a  poorer  channel  can  still  extract  part  of  the  information  sent. 


Problem  6.27  The  2-dimensional  constellation  shown  in  Figure  6.39  is  to  be  used  for  signaling 
over  an  AWGN  channel. 

(a) Specify  the  ML  decision  if  the  observation  is  (/,  Q)  =  (1,  —1). 

(b)  Carefully  redraw  the  constellation  and  sketch  the  ML  decision  regions. 

(c)  Find  an  intelligent  union  bound  for  the  symbol  error  probability  conditioned  on  sq  being  sent, 
as  a  function  of  Eb/No- 

Problem  6.28  (Demodulation  with  amplitude  mismatch)  Consider  a  4PAM  system  us¬ 
ing  the  constellation  points  {±1,±3}.  The  receiver  has  an  accurate  estimate  of  its  noise  level. 
An  automatic  gain  control  (AGC)  circuit  is  supposed  to  scale  the  decision  statistics  so  that  the 
noiseless  constellation  points  are  in  {±1,  ±3}.  ML  decision  boundaries  are  set  according  to  this 
nominal  scaling. 

(a)  Suppose  that  the  AGC  scaling  is  faulty,  and  the  acfna/ noiseless  signal  points  are  at  {±0.9,  ±2.7}. 
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Figure  6.39:  Constellation  for  Problem  6.27 


Sketch  the  points  and  the  mismatched  decision  regions.  Find  an  intelligent  union  bound  for  the 
symbol  error  probability  in  terms  of  the  Q  function  and  Eh/NQ. 

(b)  Repeat  (a),  assuming  that  faulty  AGC  scaling  puts  the  noiseless  signal  points  at  {±1.1,  ±3.3}. 

(c)  AGC  circuits  try  to  maintain  a  constant  output  power  as  the  input  power  varies,  and  can  be 
viewed  as  imposing  a  scale  factor  on  the  input  inversely  proportional  to  the  square  root  of  the 
input  power.  In  (a),  does  the  AGC  circuit  overestimate  or  underestimate  the  input  power? 


Problem  6.29  (Demodulation  with  phase  mismatch)  Consider  a  BPSK  system  in  which 
the  receiver’s  estimate  of  the  carrier  phase  is  off  by  6. 

(a)  Sketch  the  I  and  Q  components  of  the  decision  statistic,  showing  the  noiseless  signal  points 
and  the  decision  region. 

(b)  Derive  the  BER  as  a  function  of  6  and  Eb/No  (assume  that  6  <  |). 

(c)  Assuming  now  that  0  is  a  random  variable  taking  values  uniformly  in  [— f ,  f],  numerically 
compute  the  BER  averaged  over  6,  and  plot  it  as  a  function  of  E^/Nq.  Plot  the  BER  without 
phase  mismatch  as  well,  and  estimate  the  dB  degradation  due  to  the  phase  mismatch. 


Problem  6.30  (Simplex  signaling  set)  Let  so(t), ...,  SM-i(t)  denote  a  set  of  equal  energy, 
orthogonal  signals.  Construct  a  new  M-ary  signal  set  from  these  as  follows,  by  subtracting  out 
the  average  of  the  M  signals  from  each  signal  as  follows: 

^  M-l 

Uk{t)  =  Sk{t)  -  —  ^  Sj{t)  ,  /c  =  0, 1, ..,  M  -  1 

j=0 


This  is  called  the  simplex  signaling  set. 

(a)  Find  a  union  bound  on  the  symbol  error  probability,  as  a  function  of  E^/Nq  and  M,  for 
signaling  over  the  AWGN  channel  using  the  signal  set  {ukit),  fc  =  0,l,...,M  —  1}. 

(b)  Compare  the  power  efficiencies  of  the  simplex  and  orthogonal  signaling  sets  for  a  given  M, 
and  use  these  to  estimate  the  performance  difference  in  dB  between  these  two  signaling  schemes, 
for  M  =  4,8, 16,  32.  What  happens  as  M  gets  large? 

(c)  Use  computer  simulations  to  plot,  for  M  =  4,  the  error  probability  (log  scale)  versus  Eb/No 
(dB)  of  the  simplex  signaling  set  and  the  corresponding  orthogonal  signaling  set.  Are  your  results 
consistent  with  the  prediction  from  (b)? 

Problem  6.31  (Soft  decisions  for  BPSK)  Consider  a  BPSK  system  in  which  0  and  1  are 
equally  likely  to  be  sent,  with  0  mapped  to  +1  and  1  to  -1  as  usual.  Thus,  the  decision  statistic 
Y  =  A  +  if  0  is  sent,  and  Y  =  —A  +  A^  if  1  is  sent,  where  A  >  0  and  N  ~  A^(0,  a^). 
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(a)  Show  that  the  LLR  is  conditionally  Gaussian  given  the  transmitted  bit,  and  that  the  condi¬ 
tional  distribution  is  scale- invariant,  depending  only  on  Eh/No- 

(b)  If  the  BER  for  hard  decisions  is  10%,  specify  the  conditional  distribution  of  the  LLR,  given 
that  0  is  sent. 


Problem  6.32  (Soft  decisions  for  PAM)  Consider  soft  decisions  for  4PAM  signaling  as  in 
Example  6.1.3.  Assume  that  the  signals  have  been  scaled  to  ±1,  ±3  (i.e.,  set  A  =  1  in  Example 
6.1.3.  The  system  is  operating  at  Ei,/Nq  of  6  dB.  Bits  bi,  62  G  {0, 1}  are  mapped  to  the  symbols 
using  Gray  coding.  Assume  that  {bi,b2)  =  (0,0)  for  symbol  -3,  and  (1,0)  for  symbol  -|-3. 

(a)  Sketch  the  constellation,  along  with  the  bit  maps.  Indicate  the  ML  hard  decision  boundaries. 

(b)  Find  the  posterior  symbol  probability  P[—3\y]  as  a  function  of  the  noisy  observation  y.  Plot 
it  as  a  function  of  y. 

Hint:  The  noise  variance  can  be  inferred  from  the  signal  levels  and  SNR. 

(c)  Find  P\bi  =  1||/]  and  P^2  =  1|2/],  and  plot  as  a  function  of  y. 

Remark:  The  posterior  probability  of  61  =  1  equals  the  sum  of  the  posterior  probabilities  of  all 
symbols  which  have  61  =  1  in  their  labels. 

(d)  Display  the  results  of  part  (c)  in  terms  of  LLRs. 


LLRiiyj)  =  log 


P[hi  =  m 
P[hi  =  i\yV 


LLR2{y)  =  log 


P[b2  = 


P[h2  =  i\y] 


Plot  the  LLRs  as  a  function  of  ?/,  saturating  the  values  as  ±50. 

(e)  Try  other  values  of  E},/Nq  (e.g.,  0  dB,  10  dB).  Comment  on  any  trends  you  notice.  How  do 
the  LLRs  vary  as  a  function  of  distance  from  the  noiseless  signal  points?  How  do  they  vary  as 
you  change  Eb/No. 

(f)  In  order  to  characterize  the  conditional  distribution  of  the  LLRs,  simulate  the  system  over 
multiple  symbols  at  Eb/No  such  that  the  BER  is  about  5%.  Plot  the  histograms  of  the  LLRs 
for  each  of  the  two  bits,  and  comment  on  whether  they  look  Gaussian.  What  happens  as  you 
increase  or  decrease  Eb/No^ 


Problem  6.33  (M-ary  orthogonal  signaling  performance  as  M  — )■  00)  We  wish  to  derive 
the  result  that 


lim  P (correct) 

M^oo 


1  f  >ln2 
0  f  <ln2 


(6.92) 


(a)  Show  that 


P(correct) 


$ 


2Eb  log2  M 


Na 


M-1 


dx 


(b)  Show  that,  for  any  x. 


lim 

M— >-cx) 


$ 


2Eb  log2  M 


n  M-1 


iVn 


0  #<ln2 

WO 

1  #>ln2 


Hint:  Use  L’Hospital’s  rule  on  the  log  of  the  expression  whose  limit  is  to  be  evaluated, 

(c)  Substitute  (b)  into  the  integral  in  (a)  to  infer  the  desired  result. 


Problem  6.34  (Effect  of  Rayleigh  fading)  Constructive  and  destructive  interference  between 
multiple  paths  in  wireless  systems  lead  to  large  fluctuations  in  received  amplitude,  modeled  as  a 
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Rayleigh  random  variable  A  (see  Problem  5.21  for  a  definition).  The  energy  per  bit  is  therefore 
proportional  to  A^,  which,  using  Problem  5.21(c),  is  an  exponential  random  variable.  Thus, 
we  can  model  E/j/Nq  as  an  exponential  random  variable  with  mean  Ei,/Nq,  where  Ei,  is  the 
average  energy  per  bit.  Simplify  notation  by  setting  ^  =  X,  and  the  mean  T'^/iVo  =  ^,  so  that 
X  Exp{^). 

(a)  Show  that  the  average  error  probability  for  BPSK  with  Rayleigh  fading  can  be  written  as 


Pe  = 


Q{\/^)  jae  dx 


Hint:  The  error  probability  for  BPSK  is  given  by  Q  ^  j ,  where  E},/Nq  is  a  random  variable. 

We  now  hnd  the  expected  error  probability  by  averaging  over  the  distribution  of  Eij/Nq. 

(b)  Integrating  by  parts  and  simplifying,  show  that  the  average  error  probability  can  be  written 
as 


n  =  l(i-(i+,)-i)  =  i(i-,i  +  |)4) 


Hint:  Q{x)  is  dehned  via  an  integral,  so  we  can  hnd  its  derivative  (when  integrating  by  parts) 
using  the  fundamental  theorem  of  calculus. 

(c)  Using  the  approximation  that  (1  +  a)^  1  +  ba  for  |a|  small,  show  that 


4{E,/No) 


at  high  SNR.  Comment  on  how  this  decay  of  error  probability  with  the  reciprocal  of  SNR 
compares  with  the  decay  for  the  AWGN  channel. 

(b)  Plot  the  error  probability  versus  ^  for  BPSK  over  the  AWGN  and  Rayleigh  fading  channels 
(BER  on  log  scale,  ^  in  dB).  Note  that  Ei,  =  Ef,  for  the  AWGN  channel.  At  BER  of  10“^,  what 
is  the  degradation  in  dB  due  to  Rayleigh  fading? 


Link  budget  analysis 

Problem  6.35  You  are  given  an  AWGN  channel  of  bandwidth  3  MHz.  Assume  that  implemen¬ 
tation  constraints  dictate  an  excess  bandwidth  of  50%.  Find  the  achievable  bit  rate,  the  E},/Nq 
required  for  a  BER  of  10“®,  and  the  receiver  sensitivity  (assuming  a  receiver  noise  hgure  of  7 
dB)  for  the  following  modulation  schemes,  assuming  that  the  bit-to-symbol  map  is  optimized  to 
minimize  the  BER  whenever  possible: 

(a)  QPSK,  (b)  8PSK,  (c)  64QAM  (d)  Coherent  16-ary  orthogonal  signaling. 

Remark:  Use  nearest  neighbors  approximations  for  the  BER. 


Problem  6.36  Consider  the  setting  of  Example  6.5.1. 

(a)  For  all  parameters  remaining  the  same,  hnd  the  range  and  bit  rate  when  using  a  64QAM 
constellation. 

(b)  Suppose  now  that  the  channel  model  is  changed  from  AWGN  to  Rayleigh  fading  (see  Problem 
6.34).  Find  the  receiver  sensitivity  required  for  QPSK  at  BER  of  10“^.  (In  practice,  we  would 
shoot  for  a  higher  uncoded  BER,  and  apply  channel  coding,  but  we  discuss  such  methods  in  later 
chapters.)  What  is  the  range,  assuming  all  other  parameters  are  as  in  Example  6.5.1?  How  does 
the  range  change  if  you  reduce  the  link  margin  to  10  dB  (now  that  fading  is  being  accounted  for, 
there  are  fewer  remaining  uncertainties). 
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Problem  6.37  Consider  a  line-of-sight  communication  link  operating  in  the  60  GHz  band  (where 
large  amounts  of  unlicensed  bandwidth  have  been  set  aside  by  regulators).  From  version  1  of 
the  Friis  formula  (6.83),  we  see  that  the  received  power  scales  as  A^,  and  hence  as  the  inverse 
square  of  the  carrier  frequency,  so  that  60  GHz  links  have  much  worse  propagation  than,  say,  5 
GHz  links  when  antenna  gains  are  fixed.  However,  from  (6.82),  we  see  that  the  we  can  get  much 
better  antenna  gains  at  small  carrier  wavelengths  for  a  hxed  form  factor,  and  version  2  of  the 
Friis  formula  (6.84)  shows  that  the  received  power  scales  as  1/A^,  which  improves  with  increasing 
carrier  frequency.  Furthermore,  electronically  steerable  antenna  arrays  with  high  gains  can  be 
implemented  with  compact  form  factor  (e.g.,  patterns  of  metal  on  circuit  board)  at  higher  carrier 
frequencies  such  as  60  GHz.  Suppose,  now,  that  we  wish  to  design  a  2  Gbps  link  using  QPSK 
with  an  excess  bandwidth  of  50%.  The  receiver  noise  hgure  is  8  dB,  and  the  desired  link  margin 
is  10  dB. 

(a)  What  is  the  transmit  power  in  dBm  required  to  attain  a  range  of  10  meters  (e.g.,  for  in- room 
communication),  assuming  that  the  transmit  and  receive  antenna  gains  are  each  10  dBi? 

(b)  For  a  transmit  power  of  20  dBm,  what  are  the  antenna  gains  required  at  the  transmitter  and 
receiver  (assume  that  the  gains  at  both  ends  are  equal)  to  attain  a  range  of  200  meters  (e.g.,  for 
an  outdoor  last-hop  link)? 

(c)  For  the  antenna  gains  found  in  (b),  what  happens  to  the  attainable  range  if  you  account  for 
additional  path  loss  due  to  oxygen  absorption  (typical  in  the  60  GHz  band)  of  16  dB/km? 

(d)  In  (c),  what  happens  to  the  attainable  range  if  there  is  a  further  path  loss  of  30  dB/km  due 
to  heavy  rain  (on  top  of  the  loss  due  to  oxygen  absorption)? 


Problem  6.38  A  10  Mbps  line-of-sight  communication  link  operating  at  a  carrier  frequency  of 
1  GHz  has  a  designed  range  of  5  km.  The  link  employs  16QAM  with  an  excess  bandwidth  of 
25%,  with  a  designed  BER  of  10“®  and  a  link  margin  of  10  dB.  The  receiver  noise  hgure  is  4 
dB,  and  the  transmit  and  receive  antenna  gains  are  10  dBi  each.  This  is  the  baseline  scenario 
against  which  each  of  the  scenarios  in  (a)-(c)  are  to  be  compared. 

(a)  Suppose  that  you  change  the  carrier  frequency  to  5  GHz,  keeping  all  other  link  parameters 
the  same.  What  is  the  new  range? 

(b)  Suppose  that  you  change  the  carrier  frequency  to  5  GHz  and  increase  the  transmit  and  receive 
antenna  gains  by  3  dBi  each,  keeping  all  other  link  parameters  the  same.  What  is  the  new  range? 

(c)  Suppose  you  change  the  carrier  frequency  to  5  GHz,  increase  the  transmit  and  receive  antenna 
directivities  by  3  dBi  each,  and  increase  the  data  rate  to  40  Mbps,  still  using  16QAM  with  excess 
bandwidth  of  25%.  All  other  link  parameters  are  the  same.  What  is  the  new  range? 


Software  Lab  6.1:  Linear  modulation  with  two-dimensional 
constellations 

This  is  a  follow-on  to  Software  Lab  4.1,  the  code  from  which  is  our  starting  point  here.  The 
objective  is  to  implement  in  complex  baseband  a  linearly  modulated  system  for  a  variety  of  signal 
constellations.  We  wish  to  estimate  the  performance  of  these  schemes  for  an  ideal  channel  via 
simulation,  and  to  compare  with  analytical  expressions.  As  in  Software  Lab  4.1,  we  use  a 
trivial  channel  filter  in  this  lab.  Dispersive  channels  are  considered  in  Ghapter  7  and  the 
associated  labs. 

0)  Use  the  code  for  Software  Lab  4.1  as  a  starting  point. 

1)  Write  a  matlab  function  randbit  that  generates  random  bits  taking  values  in  {0, 1}  (not  ±1) 
with  equal  probability. 

2)  Write  the  following  functions  mapping  bits  to  symbols  for  different  signal  constellations.  Write 
the  functions  to  allow  for  vector  inputs  and  outputs.  The  mapping  is  said  to  be  a  Gray 
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code,  or  Gray  labeling,  if  the  bit  map  for  nearest  neighbors  in  the  constellation  differ  by  exactly 
one  bit.  In  all  of  the  following,  choose  the  bit  map  to  be  a  Gray  code. 

(a)  bpskmap:  inpnt  a  0/1  bit,  ontpnt  a  ±1  bit. 

(b)  qpskmap:  inpnt  2  0/1  bits,  ontpnt  a  symbol  taking  one  of  fonr  valnes  in  ±1  ±  j. 

(c)  fonrpammap:  inpnt  2  0/1  bits,  ontpnt  a  symbol  taking  one  of  four  values  in  {±1,  ±3}. 

(d)  sixteenqammap:  input  4  0/1  bits,  output  a  symbol  taking  one  of  16  values  in  {be  +  jbg  : 
be,  bs  e  {±1,  ±3}}. 

(e)  eightpskmap:  input  3  0/1  bits,  output  a  symbol  taking  one  of  8  values  in  i  =  0, 1, ...,  7. 

3)  BPSK  symbol  generation:  Use  part  1  to  generate  12000  0/1  bits.  Map  these  to  BPSK  (±1) 
bits  using  bpskmap.  Pass  these  through  the  transmit  and  receive  hlter  in  lab  1  to  get  noiseless 
received  samples  at  rate  4/T,  as  before. 

4)  Adding  noise:  We  consider  discrete  time  additive  white  Gaussian  noise  (AWGN).  At  the  input 
to  the  receive  hlter,  add  independent  and  identically  distributed  (iid)  complex  Gaussian  noise, 
such  that  the  real  and  imaginary  part  of  each  sample  are  iid  A^(0,  (you  will  choose  ^ 
corresponding  to  a  specihed  value  of  as  described  in  part  5.  Pass  these  (rate  4/T)  noise 
samples  through  the  receive  hlter,  and  add  the  result  to  the  output  of  part  3. 

Remark:  If  the  nth  transmitted  symbol  is  6[n],  the  average  received  energy  per  symbol  is 
Eg  =  T'[|6[n]p]||5f7’  *  gc\\‘^-  Divide  that  by  the  number  of  bits  per  symbol  to  get  Ef,.  The 
noise  variance  per  dimension  is  =  ^.  This  enables  you  to  compute  E^/Nq  for  your  simula¬ 
tion  model.  The  signal-to-noise  ratio  Ei,/Nq  is  usually  expressed  in  decibels  (dB):  Eb/NQ{dB)  = 
lOlogiQ  Eh/ No{r aw).  Thus,  if  you  hx  the  transmit  and  channel  hlter  coefficients,  then  you  can 
simulate  any  given  value  of  Eb/No  in  dB  by  varying  the  value  of  the  noise  variance 

5)  Plot  the  ideal  bit  error  probability  for  BPSK,  which  is  given  by  on  a  log  scale 

as  a  function  of  Eb/No  in  dB  over  the  range  0-10  dB.  Find  the  value  of  Eb/No  that  corresponds 
to  an  error  probability  of  10“^. 

6)  For  the  value  of  Eb/No  found  in  part  5,  choose  the  corresponding  value  of  in  part  1.  Find 
the  decision  statistics  corresponding  to  the  transmitted  symbols  at  the  input  and  output  of  the 
receive  hlter,  as  in  lab  1  (parts  5  and  6).  Plot  the  imaginary  versus  the  real  parts  of  the  decision 
statistics;  you  should  see  a  noisy  version  of  the  constellation. 

7)  Using  an  appropriate  decision  rule,  make  decisions  on  the  12000  transmitted  bits  based  on 
the  12000  decision  statistics,  and  measure  the  error  probability  obtained  at  the  input  and  the 
output.  Gompare  the  results  with  the  ideal  error  probability  from  part  5.  You  should  hnd  that 
the  error  probability  based  on  the  receiver  input  samples  is  signihcantly  worse  than  that  based 
on  the  receiver  output,  and  that  the  latter  is  a  little  worse  than  the  ideal  performance  because 
of  the  ISI  in  the  decision  statistics. 

8)  Now,  map  12000  0/1  bits  into  6000  4PAM  symbols  using  function  fonrpammap  (use  as  input 
2  parallel  vectors  of  6000  bits).  As  shown  in  Ghapter  6,  a  good  approximation  (the  nearest 
neighbors  approximation)  to  the  ideal  bit  error  probability  for  Gray  coded  4PAM  is  given  by 

Q  j  •  As  in  part  5),  plot  this  on  a  log  scale  as  a  function  of  Eb/No  in  dB  over  the  range 

0-10  dB.  What  is  the  value  of  Eb/No  (dB)  corresponding  to  a  bit  error  probability  of  10“^? 

9)  Ghoose  the  value  of  the  noise  variance  corresponding  to  the  Eb/No  found  in  part  7.  Now, 
hnd  decision  statistics  for  the  6000  transmitted  symbols  based  on  the  receive  filter  output  only. 

(a)  Plot  the  imaginary  versus  the  real  parts  of  the  decision  statistics,  as  before. 

(b)  Determine  an  appropriate  decision  rule  for  estimating  the  two  parallel  bit  streams  of  6000 
bits  from  the  6000  complex  decision  statistics. 

(c)  Measure  the  bit  error  probability,  and  compare  it  with  the  ideal  bit  error  probability. 

10)  Repeat  parts  8  and  9  for  QPSK,  the  ideal  bit  error  probability  for  which,  as  a  function  of 
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Eb/No,  is  the  same  as  for  BPSK. 

11)  Repeat  parts  8  and  9  for  16QAM  (4  bit  streams  of  length  3000  each),  the  ideal  bit  error 
probability  for  which,  as  a  function  of  Eb/No,  is  the  same  as  for  4PAM. 

12)  Repeat  parts  8  and  9  for  8PSK  (3  bit  streams  of  length  4000  each).  The  ideal  bit  error 
probability  for  Gray  coded  8PSK  is  approximated  by  (using  the  nearest  neighbors  approximation) 


13)  Since  all  your  answers  above  will  be  off  from  the  ideal  answers  because  of  some  ISI,  run  a 
simulation  with  12000  bits  sent  using  Gray-coded  16-QAM  with  no  ISI.  To  do  this,  generate  the 
decision  statistics  by  adding  noise  directly  to  the  transmitted  symbols,  setting  the  noise  variance 
appropriately  to  operate  at  the  required  Eb/No-  Do  this  for  two  different  values  of  Eb/No,  the  one 
in  part  11  and  a  value  3  dB  higher.  In  each  case,  compare  the  nearest  neighbors  approximation 
to  the  measured  bit  error  probability,  and  plot  the  imaginary  versus  real  part  of  the  decision 
statistics. 


Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order.  Describe 
the  reasoning  you  used  and  the  difficulties  you  encountered. 

Tips:  Vectorize  as  many  of  the  functions  as  possible,  including  both  the  bit-to-symbol  maps  and 
the  decision  rules.  Do  BPSK  and  4-PAM  first,  where  you  will  only  use  the  real  part  of  the  complex 
decision  statistics.  Leverage  this  for  QPSK  and  16-QAM,  by  replicating  what  you  did  for  the 
imaginary  part  of  the  decision  statistics  as  well.  To  avoid  confusion,  keep  different  matlab  files 
for  simulations  regarding  different  signal  constellations,  and  keep  the  analytical  computations 
and  plots  separate  from  the  simulations. 


Software  Lab  6.2:  Modeling  and  performance  evaluation 
on  a  wireless  fading  channel 

Let  us  consider  the  following  simple  model  of  a  wireless  channel  (obtained  after  filtering  and 
sampling  at  the  symbol  rate,  and  assuming  that  there  is  no  ISI).  If  {&[n]}  is  the  transmitted 
symbol  sequence,  then  the  complex-valued  received  sequence  is  given  by 

y[n]  =  h[n]b[n]  +  w[n]  (6.93) 

where  {w[n\  =  Wc[n\  +  jws[n]}  is  an  iid  complex  Gaussian  noise  sequence  with  Wc[n],  i.i.d. 

Y(0,(T^  =  ^)  random  variables.  We  say  that  w[n\  has  variance  per  dimension.  The  channel 
sequence  {^[n]}  is  a  time- varying  sequence  of  complex  gains. 

Equation  (6.93)  models  the  channel  at  a  given  time  as  a  simple  scalar  gain  h[n].  On  the  other 
hand,  as  discussed  in  Example  2.5.4,  a  multipath  wireless  channel  cannot  be  modeled  as  a  simple 
scalar  gain:  it  is  dispersive  in  time,  and  exhibits  frequency  selectivity.  However,  it  is  shown  in 
Ghapter  8  that  we  can  decompose  complicated  dispersive  channels  into  scalar  models  by  using 
frequency- domain  modulation,  or  OFDM,  which  transmits  data  in  parallel  over  narrow  enough 
frequency  slices  such  that  the  channel  over  each  slice  can  be  modeled  as  a  complex  scalar. 
Equation  (6.93)  could  therefore  be  interpreted  as  modeling  time  variations  in  such  scalar  gains. 

Rayleigh  fading:  The  channel  gain  sequence  {h[n]  =  hc[n]  +jhs[n]},  where  {hc[n]}  and  {hs[n]} 
are  zero  mean,  independent  and  identically  distributed  colored  Gaussian  random  processes.  The 
reason  this  is  called  Rayleigh  fading  is  that  \h[n]  \  =  \/ [n]  +  [n]  is  a  Rayleigh  random  variable. 

Remark:  The  Gaussianity  arises  because  the  overall  channel  gain  results  from  a  superposition  of 
gains  from  multiple  reflections  off  scatterers. 
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Simulation  of  Rayleigh  fading:  We  will  use  a  simple  model  wherein  the  colored  channel  gain 
sequence  {^[n]}  is  obtained  by  passing  white  Gaussian  noise  through  a  hrst-order  recursive  hlter, 
as  follows: 


hc[n]  =  phc[n  —  1]  +  u[n] 

hs[n]  =  phs[n  -  1]  +  v[n] 


(6.94) 


where  {^[n]}  and  {n[n]}  are  independent  real- valued  white  Gaussian  sequences,  with  i.i.d.  iV(0,  /3^) 
elements.  The  parameter  p  (0  <  p  <  1)  determines  how  rapidly  the  channel  varies.  The  model  for 
I  and  Q  gains  in  (6.94)  are  examples  of  first-order  autoregressive  (AR(1 ))  random  processes:  au¬ 
toregressive  because  future  values  depend  on  the  past  in  a  linear  fashion,  and  hrst  order  because 
only  the  immediately  preceding  value  affects  the  current  one. 

Setting  up  the  fading  simulator 

(a)  Set  up  the  AR(1)  Rayleigh  fading  model  in  matlab,  with  p  and  as  programmable  param¬ 
eters. 

(b)  Galculate  E[|h[?7,]p]  =  2E  analytically  as  a  function  of  p  and  Use  simulation 

to  verify  your  results,  setting  p  =  .99  and  fi  =  .01.  You  may  choose  to  initialize  hc[0]  and  /^^[O] 
as  iid  Y(0,  n^)  in  your  simulation.  Use  at  least  10,000  samples. 

(c)  Plot  the  instantaneous  channel  power  relative  to  the  average  channel  power,  in  dB  as 

a  function  of  n.  Thus,  0  dB  corresponds  to  the  average  value  of  2n^.  You  will  occasionally  see 
sharp  dips  in  the  power,  which  are  termed  deep  fades. 

(d)  Dehne  the  channel  phase  6[n]  =  angle(h[n])  =  tan“^j^^.  Plot  6[n]  versus  n.  Gompare  with 
(c);  you  should  see  sharp  phase  changes  corresponding  to  deep  fades. 

QPSK  in  Rayleigh  fading 

Now,  implement  the  model  (6.93),  where  {^[n]}  correspond  to  Gray  coded  QPSK,  using  an  AR(1) 
simulation  of  Rayleigh  fading  as  in  (a).  Assume  that  the  receiver  has  perfect  knowledge  of  the 
channel  gains  {h[n]},  and  employs  the  decision  statistic  Z[n]  =  h*[n]y[n]. 

Remark:  In  practice,  the  channel  estimation  required  for  implementing  this  is  achieved  by  insert¬ 
ing  pilot  symbols  periodically  into  the  data  stream.  The  performance  will,  of  course,  be  worse 
than  with  the  ideal  channel  estimates  considered  here. 

(e)  Do  scatter  plots  of  the  two-dimensional  received  symbols  {2/[n.]},  and  of  the  decision  statistics 
{E[?7,]}.  What  does  multiplying  by  h*[ri\  achieve? 

(f)  Implement  a  decision  rule  for  the  bits  encoded  in  the  QPSK  symbols  based  on  the  statistics 
{Z[n]}.  Estimate  by  simulation,  and  plot,  the  bit  error  probability  (log  scale)  as  a  function  of 
the  average  E},/Nq  (dB),  where  Ei,/Nq  ranges  from  0  to  30  dB.  Use  at  least  10,000  symbols  for 
your  estimate.  On  the  same  plot,  also  plot  the  analytical  bit  error  probability  as  a  function  of 
Eb/No  when  there  is  no  fading.  You  should  see  a  marked  degradation  due  to  fading.  How  do 
you  think  the  error  probability  in  fading  varies  with  Eb/Nffi 

Relating  simulation  parameters  to  Eb/No:  The  average  symbol  energy  is  Eg  =  E'[|6[n]p]E'[|h[n]p], 
and  Eb  =  ■  This  is  a  function  of  the  constellation  scaling  and  the  parameters  and  p  in 

the  fading  simulator  (see  (b)).  You  can  therefore  £x  Eg,  and  hence  Eb,  by  hxing  /3,  p  (e.g.,  as  in 
(b)),  and  £x  the  scaling  of  the  {&[n]}  (e.g.,  keep  the  constellation  points  as  ±1  ±  j).  Eb/No  can 
now  be  varied  by  varying  the  variance  of  the  noise  in  (6.93). 

Diversity 

The  severe  degradation  due  to  Rayleigh  fading  can  be  mitigated  by  using  diversity:  the  proba¬ 
bility  that  two  paths  are  simultaneously  in  a  deep  fade  is  less  likely  than  the  probability  that  a 
single  path  is  in  a  deep  fade.  Gonsider  a  receive  antenna  diversity  system,  where  the  received 
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signals  yi  and  2/2  at  the  two  antennas  are  given  by 

yi[n]  =  hi[n]h[n]  +  wi[n]  ,  . 

2/2M  =  h2[n]6[n]  +  wsin] 

Thus,  you  get  two  looks  at  the  data  stream,  through  two  different  channels. 

Implement  the  two-fold  diversity  system  in  (6.95)  as  you  implemented  (6.93),  keeping  the  fol¬ 
lowing  in  mind: 

•  The  noises  wi  and  W2  are  independent  white  noise  sequences  with  variance  ^  per  di¬ 

mension  as  before. 

•  The  channels  hi  and  /i2  are  generated  by  passing  independent  white  noise  streams  through  a 
first-order  recursive  filter.  In  relating  the  simulation  parameters  to  Et/No,  keep  in  mind  that  the 
average  symbol  energy  now  is  Eg  =  £'[|5[n]p]£'[|hi[?7,]p  -|-  |h2[n]p]. 

•  Use  the  following  maximal  ratio  combining  rule  to  obtain  the  decision  statistic 

Z2[n]  =  h*i[n]yi[n]  +  hl[n]y2[n] 


The  decision  statistic  above  can  be  written  as 


Z2[n\  =  {\hi[n]\^  -F  \h2[n]\^)h[n]  +  w[n] 

where  w[n]  is  zero  mean  complex  Gaussian  with  variance  (T^(|hi[n]p  -|-  |/i2[n]p)  per  dimension. 
Thus,  the  instantaneous  SNR  is  given  by 


RiVR[n] 


E 


\{\hi[n]\^  +  \h2[n]mn]f 


E[|h)[n]|2] 


\hi[n]\‘^ +\h2wnm?] 

2a2 


(g)  Plot  |hi[?7,]p  -|-  |h2[n]p  in  dB  as  a  function  of  n,  with  0  dB  representing  the  average  value  as 
before.  You  should  find  that  the  fluctuations  around  the  average  are  less  than  in  (c). 

(h)  Implement  a  decision  rule  for  the  bits  encoded  in  the  QPSK  symbols  based  on  the  statistics 
{Z2[n]}.  Estimate  by  simulation,  and  plot  (on  the  same  plot  as  in  (e)),  the  bit  error  probability 
(log  scale)  as  a  function  of  the  average  Ef,/No  (dB),  where  Eb/No  ranges  from  0  to  30  dB.  Use 
at  least  10,000  symbols  for  your  estimate.  You  should  see  an  improvement  compared  to  the 
situation  with  no  diversity. 


Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 

Bonus:  A  Glimpse  of  differential  modulation  and  demodulation 

Throughout  this  chapter,  we  have  assumed  that  a  noiseless  “template””  for  the  set  of  possible 
transmitted  signals  is  available  at  the  receiver.  In  the  present  context,  it  means  assuming  that 
estimates  for  the  time-varying  fading  channel  are  available.  But  what  is  these  estimates,  which 
we  used  to  generate  the  decision  statistics  earlier  in  this  lab,  are  not  available?  One  approach  that 
avoids  the  need  for  explicit  channel  estimation  is  based  on  exploiting  the  fact  that  the  channel 
does  not  change  much  from  symbol  to  symbol.  Let  us  illustrate  this  for  the  case  of  QPSK.  The 
model  is  exactly  as  in  (6.93)  or  (6.95),  but  the  channel  sequence(s)  is(are)  unknown  a  priori.  This 
necessitates  encoding  the  data  in  a  different  way.  Specifically,  let  d[n]  be  a  Gray  coded  QPSK 
information  sequence,  which  contains  information  about  the  bits  of  interest.  Instead  of  sending 
d[n]  directly,  we  generate  the  transmitted  sequence  b[n]  by  differential  encoding  as  follows: 

h[n]  =  d[n]h[n  —  1],  n  =  1,  2,  3, 4, .. 
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(You  can  initialize  6(0)  as  any  element  of  the  constellation,  known  by  agreement  to  both  trans¬ 
mitter  and  receiver.  Or,  just  ignore  the  first  information  symbol  in  your  demodulation).  At 
the  receiver,  use  differential  demodulation  to  generate  the  decision  statistic  for  the  information 
symbol  d[n]  as  follows: 

=  y[n]y*[n  —  1]  single  path 

ZTH  =  yiW\y*i[n  -  1]  +  y2[n]y*2[n  -  1]  dual  diversity 

where  the  superscript  indicates  noncoherent  demodulation,  i.e.,  demodulation  that  does  not 
require  an  explicit  channel  estimate. 

Bonus  assignment  report:  Estimate  by  simulation,  and  plot,  the  bit  error  probability  of  Gray 
coded  differentially  encoded  QPSK  as  a  function  of  E},/Nq  for  both  single  path  and  dual  diversity. 
Compare  with  the  curves  for  coherent  demodulation  that  you  have  obtained  earlier.  How  much 
(in  dB)  does  the  performance  degrade  by?  Document  your  results  as  in  the  earlier  lab  reports. 


6. A  Irrelevance  of  component  orthogonal  to  signal  space 


Conditioning  on  i/j,  we  have  y{t)  =  Si{t)  +  n{t).  The  component  of  the  received  signal  orthogonal 
to  the  signal  space  is  given  by 

n—1  n—1 

y^{t)  =  y{t)  -  ys{t)  =  y{t)  -  Y[k]'4)k{t)  =  Si{t)  +  n{f)  -  ^  {Si[k]  N[k])  fjkit) 

k=0  k=0 

But  the  signal  Si{t)  lies  in  the  signal  space,  so  that 

n—1 

Siif)  -  ^Si[/c]V’fc(t)  =  0 

fc=0 


That  is,  the  signal  contribution  to  y^  is  zero,  and 

n—1 

y^{t)  =  n{f)  -  ^N[k]'4)k{f)  =  n^{f) 

k=0 


where  denotes  the  noise  projection  orthogonal  to  the  signal  space. 

We  now  show  that  n-^{t)  is  independent  of  the  signal  space  noise  vector  N.  Since  and  N  are 
jointly  Gaussian,  it  suffices  to  show  that  they  are  uncorrelated.  For  any  t  and  /c,  we  have 


cov(n^(t),  V[fc])  =  E[n^(t)V[fc]]  =  E  {n{t)  -  X;"=o  N[j]ijj{t)}N[k] 


=  E|n(()AfW|  -  EliV|j]Af 


(6.96) 


The  first  term  on  the  extreme  right-hand  side  can  be  simplified  as 


E[n{t){n,^jJk)]  = '^[n{t)  J  n{s)^jJk{s)ds]  =  J 'E[n{t)n{s)]^jJk{s)ds  =  j  (j‘^5{s—f)'ipk{s)ds 

Plugging  (6.97)  into  (6.96),  and  noting  that  E[V[j]V[fc]]  =  a'^djk,  we  obtain  that 

cov(n-^(f),  V[j])  =  a‘^'fk{t)  -  a'^ifkit)  =  0 


a'^'ifkit) 

(6.97) 
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What  we  have  just  shown  is  that  the  component  of  the  received  signal  orthogonal  to  the  signal 
space  contains  the  noise  component  only,  and  thus  does  not  depend  on  which  signal  is  sent 
under  a  given  hypothesis.  Since  is  independent  of  N,  the  noise  vector  in  the  signal  space, 
knowing  does  not  provide  any  information  about  N.  These  two  observations  imply  that 
is  irrelevant  for  our  hypothesis  problem.  The  preceding  discussion  is  illustrated  in  Figure  6.9, 
and  enables  us  to  reduce  our  infinite-dimensional  problem  to  a  finite-dimensional  vector  model 
restricted  to  the  signal  space. 

Note  that  our  irrelevance  argument  depends  crucially  on  the  property  of  WGN  that  its  projec¬ 
tions  along  orthogonal  directions  are  independent.  Even  though  does  not  contain  any  signal 
component  (since  these  by  definition  fall  into  the  signal  space),  if  and  N  exhibited  statis¬ 
tical  dependence,  one  could  hope  to  learn  something  about  N  from  n-*-,  and  thereby  improve 
performance  compared  to  a  system  in  which  y-^  is  thrown  away.  However,  since  n-*-  and  N  are 
independent  for  WGN,  we  can  restrict  attention  to  the  signal  space  for  our  hypothesis  testing 
problem. 
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Chapter  7 

Channel  Coding 


We  have  seen  in  Chapter  6  that,  for  signaling  over  an  AWGN  channel,  the  error  probability 
decays  exponentially  with  SNR,  with  the  rate  of  decay  determined  by  the  power  efficiency  of 
the  constellation.  For  example,  for  BPSK  or  Gray  coded  QPSK,  the  error  probability  is  given 

by  P  =  Q  .  We  have  also  seen  in  Chapter  6  how  to  engineer  the  link  bndget  so  as 

to  gnarantee  a  certain  desired  performance.  So  far,  however,  we  have  only  considered  uncoded 
systems,  in  which  bits  to  be  sent  are  directly  mapped  to  symbols  sent  over  the  channel.  We 
now  indicate  how  it  is  possible  to  improve  performance  by  channel  coding,  which  corresponds  to 
inserting  rednndancy  strategically  prior  to  transmission  over  the  channel. 

A  bit  of  historical  perspective  is  in  order.  As  mentioned  in  Chapter  1,  Shannon  showed  the 
optimality  of  separate  source  and  channel  coding  back  in  1948.  Shannon  also  provided  a  theory 
for  computing  the  limits  of  communication  performance  over  any  channel  (given  constraints 
such  as  power  and  bandwidth).  He  did  not  provide  a  constructive  means  of  attaining  these 
limits;  his  proofs  employed  randomized  constructions.  For  reasons  of  computational  complexity, 
it  was  assumed  that  such  strategies  could  never  be  practical.  Hence,  for  decades  after  Shannon’s 
1948  publication,  researchers  focused  on  algebraic  constructions  (for  which  decoding  algorithms 
of  reasonable  complexity  could  be  devised)  to  create  powerful  channel  codes,  but  never  quite 
succeeded  in  attaining  Shannon’s  benchmarks.  This  changed  with  the  invention  of  turbo  codes 
by  Herron  et  al  in  1993:  their  conference  paper  laid  out  a  simple  coding  strategy  that  got  to  within 
a  dB  of  Shannon  capacity.  They  took  codes  which  were  easy  to  encode,  and  used  scramblers  to 
make  them  random-like.  Maximum  likelihood  decoding  for  such  codes  is  too  computationally 
complex,  but  Berrou  et  al  showed  that  suboptimal  iterative  decoding  methods  provide  excellent 
performance  with  reasonable  complexity.  It  was  then  realized  that  a  different  class  of  random¬ 
like  codes,  called  low  density  parity  check  (LDPC)  codes,  along  with  an  appropriate  iterative 
decoding  procedure,  had  actually  been  invented  by  Gallager  in  the  1960s.  Since  then,  there  has 
been  a  massive  effort  to  devise  and  implement  a  wide  variety  of  “turbo-like”  codes  (i.e.,  random¬ 
like  codes  amenable  to  iterative  decoding),  with  the  result  that  we  can  now  approach  Shannon’s 
performance  benchmarks  over  almost  any  channel. 

In  this  chapter,  we  provide  a  glimpse  of  how  Shannon’s  performance  benchmarks  are  computed, 
how  channel  codes  are  constructed,  and  how  iterative  decoding  works.  A  systematic  and  com¬ 
prehensive  treatment  of  information  theory  and  channel  coding  would  take  up  entire  textbooks 
in  itself,  hence  our  goal  is  to  provide  just  enough  exposure  to  some  of  the  key  ideas  to  encourage 
further  exploration. 

Chapter  Plan:  In  Section  7.1,  we  discuss  two  extreme  examples,  uncoded  transmission  and 
repetition  coding,  in  order  to  motivate  the  need  for  more  sophisticated  channel  coding  strategies. 
A  generic  model  for  channel  coding  is  discussed  in  Section  7.2.  Section  7.3  introduces  Shannon’s 
information-theoretic  framework,  which  provides  fundamental  performance  limits  for  any  chan- 
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nel  coding  scheme,  and  discusses  its  practical  implications.  Linear  codes,  which  are  the  most 
prevalent  class  of  codes  used  in  practice,  are  introduced  in  Section  7.4.  Finally,  we  discuss  belief 
propagation  decoding,  which  has  been  crucial  for  approaching  Shannon  performance  limits  in 
practice,  in  Section  7.5. 


7.1  Motivation 


Figure  7.1:  Block  error  probability  versus  bit  error  probability  for  uncoded  transmission  (block 
size  is  1500  bytes). 


Uncoded  transmission:  First,  let  us  consider  what  happens  without  channel  coding.  Suppose 
that  we  are  sending  a  data  block  of  1500  bytes  (i.e.,  n  =  12000  bits,  since  1  byte  comprises  8 
bits)  over  a  binary  symmetric  channel  (see  Chapter  5)  with  bit  error  probability  p,  where  errors 
occur  independently  for  each  bit.  Such  a  BSC  could  be  induced,  for  example,  by  making  hard 


decisions  for  Gray  coded  QPSK  over  an  AWGN  channel;  in  this  case,  we  have  p 


Let  us  now  dehne  block  error  as  the  event  that  one  or  more  bits  in  the  block  are  in  error.  The 
probability  that  all  of  the  bits  get  through  correctly  is  given  by  (1  —  p)",  so  that  the  probability 
of  block  error  is  given  by 


PB  =  i-{i-pr 


Figure  7.1  plots  the  probability  of  block  error  versus  the  probability  of  bit  error  on  a  log- log 
scale.  This  simple  computation  does  allow  us  to  make  some  useful  observations. 

(a)  For  p  >  10“^,  the  probability  of  block  error  is  essentially  one.  This  is  because  the  expected 
number  of  errors  in  the  block  is  given  by  np,  and  when  this  is  of  the  order  of  one,  the  probability 
of  making  at  least  one  error  is  very  close  to  one,  because  of  the  law  of  large  numbers.  Using  this 
reasoning,  we  see  that  it  becomes  harder  and  harder  to  guarantee  reliability  as  the  block  size 
increases,  since  p  must  scale  as  1/n.  Clearly,  this  is  not  a  sustainable  approach.  For  example, 
even  the  corruption  of  a  single  bit  in  a  large  computer  hie  can  cause  chaos,  so  we  must  hnd  more 
sophisticated  means  of  protecting  the  data  than  just  trying  to  drive  the  raw  bit  error  probability 
to  zero. 

(b)  It  is  often  possible  to  efficiently  detect  block  errors  with  very  high  probability.  In  practice, 
this  might  be  achieved  by  using  a  cyclic  redundancy  check  (CRC)  code,  but  we  do  not  discuss  the 
specihc  error  detection  mechanism  here.  If  a  block  error  is  detected,  then  the  receiver  may  ask 
the  transmitter  to  retransmit  the  packet,  if  such  retransmissions  are  supported  by  the  underlying 
protocols.  The  link  efficiency  in  this  case  becomes  1  —  Pb-  Thus,  if  we  can  do  retransmissions. 
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uncoded  transmission  may  actually  not  be  a  terrible  idea.  In  our  example,  the  link  is  90%  efficient 
(Pb  =  10“^)  for  bit  error  probability  p  around  10“®  —  10“^,  and  99%  efficient  (Pg  =  10“^)  for  p 
around  10“^  —  10“®. 

(c)  For  Gray  coded  QPSK,  p  =  Q  so  that  p  =  10“®  requires  Eb/No  of  about  10.55  dB. 

This  is  exactly  the  scenario  in  the  link  budget  example  modeling  a  5  GHz  WLAN  link  in  Ghapter 
6.  We  see,  therefore,  that  uncoded  transmission,  along  with  retransmissions,  is  a  viable  option 
in  that  setting. 


Figure  7.2:  Error  probability  decays  rapidly  as  a  function  of  blocklength  for  a  repetition  code. 


Repetition  coding:  Next,  let  us  consider  the  other  extreme,  in  which  we  send  n  copies  of  a  single 
bit  over  a  BSG  with  error  probability  p.  That  is,  we  either  send  a  string  of  n  zeros,  or  a  string  of 
n  ones.  The  channel  may  flip  some  of  these  bits;  since  the  errors  are  independent,  the  number  of 
errors  is  a  binomial  random  variable,  Bin{n,p).  For  p  <  |,  the  average  number  of  bits  in  error, 
np  <  n/2,  hence  a  natural  decoding  rule  is  to  employ  majority  logic:  decide  on  0  if  the  majority 
of  received  bits  is  zero,  and  on  1  otherwise.  Taking  n  to  be  odd  for  simplicity  (otherwise  we 
need  to  specify  a  tiebreaker  when  there  are  an  equal  number  of  zeros  and  ones),  a  block  error 
occurs  if  the  number  of  errors  is  \n/2\  or  more.  Using  the  binomial  PMF,  we  have  the  following 
expression  for  the  block  error  probability: 

m=\n/2\ 

Figure  7.2  plots  the  probability  of  block  error  versus  n  for  p  =  10“^  and  p  =  10“^.  Glearly, 
Pb  — )■  0  as  u  — )■  cxo,  so  we  are  doing  well  in  terms  of  reliability.  To  see  why,  let  us  invoke  the  LLN 
again:  the  average  number  of  errors  is  np  <  |'n/2],  so  that,  as  n  — ?■  oo,  the  number  of  errors  is 
smaller  than  [n/2]  with  probability  one.  However,  we  are  only  sending  one  bit  of  information  for 
every  n  bits  that  we  send  over  the  channel,  corresponding  to  a  code  rate  of  1/n  (one  information 
bit  for  every  n  transmitted  bits),  which  tends  to  zero  as  n  — ?■  oo. 

We  have  invoked  the  LLN  to  explain  the  performance  of  both  uncoded  transmission  and  repetition 
coding  for  large  n,  but  neither  of  these  approaches  provides  reliable  performance  at  nonzero 
coding  rates.  As  n  — ?■  cxd,  the  block  error  rate  Pb  — t  1  for  uncoded  transmission,  while  the 
code  rate  tends  to  zero  for  the  repetition  code.  However,  it  is  possible  to  design  channel  coding 
schemes  between  these  two  extremes  which  provide  arbitrarily  reliable  communication  (Pb  — )■  0 
as  n  — )■  cxo)  at  non-vanishing  code  rates.  The  existence  of  such  codes  is  guaranteed  by  LLN-style 
arguments.  For  example,  for  a  BSG  with  crossover  probability  p,  as  n  gets  large,  the  number 
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of  errors  clusters  around  np.  Thus,  the  basic  intuition  is  that,  if  we  are  able  to  insert  enough 
redundancy  to  correct  a  number  of  errors  of  the  order  of  np,  then  we  should  be  able  to  approach 
zero  block  error  probability.  Giving  precise  form  to  such  existence  arguments  is  the  realm  of 
information  theory,  which  can  be  used  to  establish  fundamental  performance  limits  for  almost 
any  reasonable  channel  model,  while  coding  theory  concerns  itself  with  constructing  practical 
coding  schemes  that  approach  these  performance  limits.  A  detailed  exposition  of  information 
and  coding  theory  is  well  beyond  our  scope,  but  our  goal  here  is  to  provide  just  enough  exposure 
to  stimulate  and  guide  further  exploration. 


7.2  Model  for  Channel  Coding 

We  introduce  some  basic  terminology  related  to  channel  coding,  and  discuss  where  it  fits  within 
a  communication  link. 

Binary  code:  An  (n,  k)  binary  code  maps  k  information  bits  to  n  transmitted  bits,  where  n>  k. 
Each  of  the  k  information  bits  can  take  any  value  in  {0, 1},  hence  the  code  C  is  a  set  of  2^ 
codewords,  each  a  binary  vector  of  length  n.  The  code  rate  is  dehned  as  Rc  =  k/n. 
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Figure  7.3:  High-level  model  for  coded  system. 


Figure  7.3  provides  a  high-level  view  of  how  a  binary  channel  code  can  be  used  over  a  com¬ 
munication  link.  The  encoder  maps  the  fc-bit  information  word  u  to  an  n-bit  codeword  x.  As 
discussed  shortly,  the  “channel”  shown  in  the  figure  is  an  abstraction  that  includes  operations 
at  the  transmitter  and  the  receiver,  in  addition  to  the  physical  channel.  The  output  y  of  the 
channel  is  a  length  n  vector  of  hard  decisions  (bits)  or  soft  decisions  (real  numbers)  on  the  coded 
bits.  These  are  then  used  by  the  decoder  to  provide  an  estimate  u  of  the  information  bits.  We 
declare  a  block  error  if  u  7^  u. 


bit  decisions  (hard  or  soft) 


"Channel"  (as  seen  by  binary  channel  code) 


Figure  7.4:  An  example  of  bit  interleaved  coded  modulation. 


352 


Figure  7.4  provides  a  specific  example  illustrating  how  the  preceding  abstraction  connects  to 
the  transceiver  design  framework  developed  in  earlier  chapters.  It  shows  a  binary  code  used 
for  signaling  over  an  AWGN  channel  using  Gray  coded  16QAM.  We  see  that  n  coded  bits  are 
mapped  to  n/4  comp  lex- valued  symbols  at  the  transmitter.  Since  channel  codes  are  typically 
designed  for  random  errors,  we  have  inserted  an  interleaver  between  the  channel  encoder  and 
the  modulator  in  order  to  disperse  potential  correlations  in  errors  among  bits.  The  modulator 
could  employ  linear  modulation  as  described  in  Ghapter  4,  with  demodulation  as  in  Ghapter 
6  for  an  ideal  AWGN  channel,  or  more  sophisticated  equalization  strategies  for  handling  the 
intersymbol  interference  due  to  channel  dispersion  (see  Ghapter  8).  An  alternative  frequency 
domain  modulation  strategy,  termed  Orthogonal  Frequency  Division  Multiplexing  (OFDM),  for 
handling  channel  dispersion  are  also  discussed  in  Ghapter  8.  However,  for  our  present  purpose 
of  discussing  channel  coding,  we  abstract  all  of  these  details  away.  Indeed,  as  shown  in  Figure 
7.4,  the  “channel”  from  Figure  7.3  includes  all  of  these  operations,  with  the  hnal  output  being 
the  hard  or  soft  decisions  supplied  to  the  decoder.  Problem  7.4  explores  the  nature  of  this 
equivalent  channel  for  some  example  constellations.  Often,  even  if  the  physical  channel  has 
memory,  the  interleaving  and  deinterleaving  operations  allow  us  to  model  the  equivalent  channel 
as  memoryless:  the  output  yi  depends  only  on  coded  bit  Xj,  and  the  channel  is  completely 
characterized  by  the  conditional  density  p{yi\xi).  For  example,  for  hard  decisions,  we  may  model 
the  equivalent  channel  as  a  binary  symmetric  channel  with  error  probability  p.  For  soft  decisions. 
Pi  may  be  a  real  number,  or  may  comprise  several  bits,  hence  the  channel  model  would  be  a  little 
more  complicated. 

The  preceding  approach,  which  neatly  separates  out  the  binary  channel  code  from  the  signal 
processing  related  to  transmitting  and  receiving  over  a  physical  channel,  is  termed  bit  interleaved 
coded  modulation  (BICM).  If  we  use  a  binary  code  of  rate  Rc  =  k/n  and  a  symbol  alphabet  of 
size  M,  then  the  overall  rate  of  communication  over  the  channel  is  given  by  Rc  logg  M  bits  per 
symbol.  From  Ghapter  4,  we  know  that,  using  ideal  Nyquist  signaling,  we  can  signal  at  rate  W 
complex- valued  symbols/sec  over  a  bandlimited  passband  channel  of  bandwidth  W .  Thus,  the 
rate  of  communication  in  bits  per  second  (bps)  is  given  by  Rh  =  RcW  log2  M.  The  bandwidth 
efficiency,  or  spectral  efficiency,  can  now  be  dehned  as 

r  =  ^  =  Rclog^  M  =  M  (7.1) 

in  bps/Hz,  or  bits/symbol  (for  ideal  Nyquist  signaling).  Gomparing  with  Ghapter  4  (where  we 
termed  this  quantity  pw)^  what  has  changed  is  that  we  must  now  account  for  the  rate  of  the 
binary  code  that  we  have  wrapped  around  our  communication  link. 

We  also  need  to  revisit  our  SNR  concepts  and  carefully  keep  track  of  information  bits,  coded 
bits,  and  modulated  symbols,  when  computing  signal  power  or  energy.  The  quantity  Eh  refers  to 
energy  per  information  bit.  When  we  encode  these  bits  using  a  binary  code  of  rate  Rc,  the  energy 
per  coded  bit  is  Ec  =  RcEh  (information  bits  per  coded  bit,  times  energy  per  information  bit). 
When  we  then  put  the  coded  bits  through  a  modulator  that  outputs  M-ary  symbols  (log2  M 
coded  bits  per  symbol),  we  obtain  that  the  energy  per  modulated  symbol  is  given  by 

Es  =  Ec  log2  M  =  RcEh  log2  M 


In  short,  we  have 


Es  =  rEh 


(7.2) 


which  makes  sense:  energy  per  symbol  equals  the  number  of  information  bits  per  symbol,  times 
the  energy  per  information  bit.  While  we  have  established  (7.2)  for  BIGM,  it  holds  generally, 
since  it  is  just  a  matter  of  energy  bookkeeping. 

BIGM  is  a  practical  approach  which  applies  to  any  physical  communication  channel,  and  the 
signihcant  advances  in  channel  coding  over  the  past  two  decades  ensure  that  there  is  little  loss  in 
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optimality  due  to  this  decoupling  of  coding  and  modulation.  In  the  preceding  example,  we  have 
used  it  in  conjunction  with  Nyquist  sampling,  which  transforms  the  continuous  time  channel 
into  a  discrete  time  channel  carrying  complex-valued  symbols.  However,  we  can  also  view  the 
Nyquist  sampled  channel  in  greater  generality,  in  which  the  inputs  to  the  effective  channel  are 
complex-valued  symbols,  and  the  outputs  are  the  noisy  received  samples  at  the  output  of  the 
equalizer/demodulator.  A  code  of  rate  R  bits/channel  use  over  this  channel  is  simply  a  collection 
of  2^^  discrete  time  complex- valued  vectors  of  length  N ,  where  N  is  the  number  of  symbols  sent 
over  the  channel.  In  our  BICM  example  with  16  QAM,  we  have  R  =  Rc  log2  M  =  and 
N  =  n/4,  but  this  framework  also  accommodates  approaches  which  tie  coding  and  modulation 
more  closely  together.  The  tools  of  information  theory  can  be  used  to  provide  fundamental 
performance  limits  for  any  such  coded  modulation  strategy.  We  provide  a  glimpse  of  such  results 
in  the  next  section. 


7.3  Shannon’s  Promise 

Shannon  established  the  held  of  information  theory  in  the  1940s.  Among  its  many  consequences 
is  the  channel  coding  theorem,  which  states  that,  if  we  allow  code  block  lengths  to  get  large 
enough,  then  there  is  a  well-dehned  quantity  called  channel  capacity,  which  determines  the 
maximum  rate  at  which  reliable  communication  can  take  place.  A  class  of  channel  models  of 
fundamental  importance  is  the  following. 

Discrete  memoryless  channel  (DMC):  Inputs  are  fed  to  the  channel  in  discrete  time,  and 
the  output  y  at  any  given  time  has  conditional  density  p{y\x),  if  x  is  the  channel  input  at  that 
time.  For  multiple  channel  uses,  the  outputs  are  conditionally  independent  given  the  inputs,  as 
follows: 

p{yi,  ...,yn\xi,  ...,Xn)  =  p{yi\xi)...p{yn\xn) 

The  inputs  may  be  constrained  in  some  manner  (e.g.,  to  take  values  from  a  hnite  alphabet,  or  to 
be  limited  in  average  or  peak  power).  A  channel  code  of  length  n  and  rate  R  bits  per  channel  use 
contains  2"^^  codewords.  That  is,  we  employ  M-ary  signaling  with  M  =  2”^,  where  each  signal, 
or  codeword,  is  a  vector  of  length  n,  with  the  jth  codeword  denoted  by  ..., 

j  =  l,...,2-«. 

Shannon’s  channel  coding  theorem  gives  us  a  compact  characterization  of  the  channel  capacity 
C  (in  bits  per  channel  use)  for  a  DMC.  It  states  that,  for  any  code  rate  below  capacity  {R  <  C), 
and  for  large  enough  block  length  n,  there  exist  codes  and  decoding  strategies  such  that  the 
block  error  probability  can  be  made  arbitrarily  small.  The  converse  of  this  result  also  holds:  for 
code  rates  above  capacity  {R  >  C),  the  block  error  probability  is  bounded  away  from  zero  for 
any  coding  strategy.  The  fundamental  intuition  is  that,  for  large  block  lengths,  events  that  cause 
errors  cluster  around  some  well-dehned  patterns  with  very  high  probability  (because  of  the  law 
of  large  numbers),  hence  it  is  possible  to  devise  channel  codes  that  can  correct  these  patterns  as 
long  as  we  are  not  trying  to  ht  in  too  many  codewords. 

Giving  the  expression  for  the  Shannon  capacity  of  a  general  DMC  is  beyond  our  scope,  but 
we  do  provide  intuitive  derivations  of  the  channel  capacity  for  the  two  DMC  models  of  greatest 
importance  to  us:  the  discrete  time  AWGN  channel  and  the  BSC.  We  then  discuss,  via  numerical 
examples,  how  these  capacity  computations  can  be  used  to  establish  design  guidelines. 

Discrete  time  AWGN  channel:  Let  us  consider  the  following  real- valued  discrete  time  AWGN 
channel  model,  where  we  send  a  codeword  consisting  of  a  sequence  of  real  numbers  {Xi,i  = 
1,  ...,n},  and  obtain  the  noisy  outputs 

Yi  =  Xi  +  Ni,  i  =  (7.3) 

where  W  ~  X(0,  N)  are  i.i.d.  Gaussian  noise  samples.  We  impose  a  power  constraint  IE[Xj^]  <  S. 
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This  model  is  called  the  discrete  time  AWGN  channel.  For  Nyquist  signaling  over  a  continuous¬ 
time  bandlimited  AWGN  channel  with  bandwidth  W,  we  can  signal  at  the  rate  of  W  complex¬ 
valued  symbols  per  second,  or  2W  real- valued  symbols/second.  This  can  be  interpreted  as  getting 
to  use  the  discrete  time  AWGN  channel  (7.3)  21V  times  per  second.  Thus,  once  we  figure  out 
the  capacity  for  the  discrete  time  AWGN  channel  in  bits  per  channel  use,  we  will  be  able  to 
specify  the  maximum  rate  at  which  information  can  be  transmitted  reliably  over  a  bandlimited 
continuous  time  channel. 

A  channel  code  over  the  discrete  time  channel  (7.3)  of  rate  R  bits/channel  contains  2""^  codewords, 
where  the  jth  codeword  =  (Xp\  ...^Xn'^Y'  satisfies  the  average  power  constraint  if 


W|xyp<ns 


2=1 

If  the  jth  codeword  is  sent,  the  received  vector  is 

Y  =  -|-  N  ,  codeword  j  transmitted  (7.4) 

where  N  =  (Yi, ...,  Y„)^  is  the  noise  vector.  The  expected  energy  of  the  noise  vector  equals 


(7.5) 

2=1 


The  expected  energy  of  the  received  vector  equals 

EIIIYIPI  =  E”.,  E||vy  +  wyi  =  Er=i  (ivyp  +  E|/V/]  +  2E1e“A'.|) 

=  ||X(^')||2  +  ^JV  <  n{S  +  N) 


(7.6) 


(The  cross  term  involving  signal  and  noise  drops  away,  since  they  are  independent  and  the  noise 
is  zero  mean.) 

We  now  provide  a  heuristic  argument  as  to  how  reliable  performance  can  be  achieved  by  letting 
the  code  block  length  n  get  large.  Invoking  the  law  of  large  numbers,  random  quantities  cluster 
around  their  averages  with  high  probability,  so  that  the  received  vector  Y  lies  inside  an  n- 
dimensional  sphere  of  radius  n{S  +  N),  and  the  noise  vector  N  lies  inside  an  n-dimensional 
sphere  of  radius  y/nN.  Gonsider  now  a  “decoding  sphere”  around  each  codeword  with  radius 
just  a  little  larger  than  y/nN.  Then  we  make  correct  decisions  with  high  probability:  if  we 
send  codeword  j,  the  noise  vector  N  is  highly  unlikely  to  push  us  outside  the  decoding  sphere 
centered  around  The  question  then  is:  what  is  the  largest  number  of  decoding  spheres  of 

radius  y/nN  that  we  can  pack  inside  the  n-dimensional  sphere  of  radius  y/n{S  -|-  N)  in  which 
the  received  vector  Y  lives?  This  sphere  packing  argument,  depicted  in  Figure  7.5,  provides  an 
estimate  of  the  largest  number  of  codewords  2"’^  that  we  can  accommodate  while  guaranteeing 
reliable  communication. 

We  now  invoke  the  result  that  the  volume  of  an  n-dimensional  sphere  of  radius  r  is  where 

Kn  is  a  constant  depending  on  n  whose  explicit  form  we  do  not  need.  We  can  now  estimate  the 
maximum  achievable  rate  R  as  follows: 


inij 


< 


Kr,{y/n{S  +  N) 


n/2 


from  which  we  obtain 


7?  <  2  log2 
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Figure  7.5:  Sphere  packing  argument  for  characterizing  rate  of  reliable  communication. 


While  we  have  used  heuristic  arguments  to  arrive  at  this  result,  it  can  actually  be  rigorously 
demonstrated  that  the  right-hand  side  of  (7.7)  is  indeed  the  maximum  possible  rate  of  reliable 
communication  over  the  discrete-time  AWGN  channel. 

Capacity  of  the  discrete  time  AWGN  channel:  We  can  now  state  that  the  capacity  of  the 
discrete  time  AWGN  channel  (7.3)  is  given  by 

Cd-AWGN  =  ^  logs  U  +  ^  )  bits/channel  use  (7.7) 


Continuous  time  bandlimited  AWGN  channel:  We  now  use  this  result  to  compute  the 
maximum  spectral  efficiency  attainable  over  a  continuous  time  bandlimited  AWGN  channel. 
The  complex  baseband  channel  corresponding  to  a  passband  channel  of  physical  bandwidth  W 
spans  [— hF/2,  W/2]  (taking  the  reference  frequency  at  the  center  of  the  passband).  Thus,  Nyquist 
signaling  over  this  channel  corresponds  to  W  complex-valued  symbols  per  second,  or  2W  uses  per 
second  of  the  real- valued  channel  (7.3).  Since  each  complex- valued  symbol  corresponds  to  two 
uses  of  the  real  discrete  time  AWGN  channel,  the  capacity  of  the  bandlimited  channel  is  given  by 
2WCd-AWGN  bits  per  second.  We  still  need  to  specify  For  each  complex- valued  sample,  the 
energy  per  symbol  Es  =  rEb  (bits/symbol,  times  energy  per  bit,  gives  energy  per  comp  lex- valued 
symbol).  The  noise  variance  per  real  dimension  is  hence  the  noise  variance  seen  by  a 

complex  symbol  is  2a‘^  =  Nq.  We  obtain 


S 

N 


El 

N. 


(7.8) 


Putting  these  observations  together,  we  can  now  state  the  following  formula  for  the  capacity  of 
the  bandlimited  AWGN  channel. 


Capacity  of  the  bandlimited  AWGN  channel: 


Cbl[W,^J  =ld^log2 


Es 

1  +  — 

N, 


bits  per  second 


(7.9) 
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It  can  be  checked  that  we  get  exactly  the  same  result  for  a  physical  (real-valued)  baseband 
channel  of  physical  bandwidth  W.  Such  a  channel  spans  [—W,  W],  but  the  transmitted  signal  is 
constrained  to  be  real-valued.  Signals  over  such  a  channel  can  therefore  be  represented  by  2W 
real-valued  samples  per  second,  which  is  the  same  as  for  a  passband  channel  of  bandwidth  W . 

For  a  system  communicating  reliably  at  a  bit  rate  of  bps  over  such  a  bandlimited  channel, 
we  must  have  Rh  <  Cbl-  Using  (7.9),  we  see  that  the  spectral  efficiency  r  =  ^  in  bps/Hz  of  the 
system  must  therefore  satisfy 


r  <  log2  (  1  +  —  )  =  log2  (  1  +  r 


■  El 

Nn 


bps/Hz  or  bits/complex  symbol 


(7.10) 


where  we  have  used  (7.2). 

The  preceding  defines  the  regime  where  reliable  communication  is  possible.  We  can  rewrite  (7.10) 
to  obtain  the  fundamental  limits  on  the  power-bandwidth  tradeoffs  achievable  over  the  AWGN 
channel,  as  follows. 


Figure  7.6:  Power-bandwidth  tradeoffs  over  the  AWGN  channel. 


Fundamental  power-bandwidth  tradeoff  for  the  bandlimited  AWGN  channel: 


Eh  2^-1 
No^  r 


regime  where  reliable  communication  is  possible 


(7.11) 


The  quantity  r  is  the  bandwidth  efficiency  of  a  bandlimited  AWGN  channel.  This  fundamental 
power-bandwidth  tradeoff  is  depicted  in  Figure  7.6,  which  plots  the  minimum  required  E^/Nq 
(dB)  versus  the  bandwidth  efficiency.  Note  that  we  cannot  make  the  E^/Nq  required  for  reliable 
communication  arbitrarily  small  even  as  the  bandwidth  efficiency  goes  to  zero.  We  leave  it 
as  an  exercise  (Problem  7.6)  to  show  that  the  minimum  possible  value  of  Eh/N^  for  reliable 
communication  (corresponding  to  r  — )■  0)  over  the  AWGN  channel  is  —1.6  dB.  Another  point 
worth  emphasizing  is  that  these  power-bandwidth  tradeoffs  assume  powerful  channel  coding,  and 
are  different  from  those  discussed  for  uncoded  systems  in  Ghapter  6:  recall  that  the  bandwidth 
efficiency  for  M-ary  uncoded  linear  modulation  was  equal  to  log2  M,  and  that  the  power  efficiency 

d? 

was  dehned  as  Numerical  examples  showing  how  channel  coding  fundamentally  changes  the 
achievable  power-bandwidth  tradeoffs  are  explored  in  more  detail  in  Section  7.3.1.  We  provide 
here  a  quick  example  that  illustrates  how  (7.11)  relates  to  real  world  scenarios. 


Example  7.3.1  (evaluating  system  feasibility  using  Shannon  limits)  A  company  claims 
to  have  developed  a  wireless  modem  with  a  receiver  sensitivity  of  -82  dBm  and  a  noise  figure  of 
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6  dB,  operating  at  a  rate  of  100  Mbps  over  a  bandwidth  of  20  MHz.  Do  yon  believe  their  claim? 
Shannon  limit  calculations:  Modeling  the  channel  as  an  ideal  bandlimited  AWGN  channel,  the 
proposed  modem  must  satisfy  (7.11).  Assuming  no  excess  bandwidth,  r  =  ^20  mh^  ~  ^  bps/Hz 

or  bits/symbol.  From  (7.11),  we  know  that  we  must  have  {Eh/ Nq) required  >  =  6.2,  or  7.9 

dB.  The  noise  PSD  Nq  is  given  by  —174  +  6  =  —168  dBm  over  1  Hz.  The  energy  per  bit  equals 
the  received  power  divided  by  the  bit  rate,  so  that  the  actual  Eh/No  for  the  advertised  receiver 
sensitivity  (i.e.,  the  receive  power  at  which  the  modem  can  operate)  is  given  by  {Eh /Nq) actual  = 
—82  —  10  logj^Q  10® +  168  =  6  dB.  This  is  1.9  dB  short  of  the  Shannon  limit,  hence  our  first  instinct 
is  not  to  believe  them. 

Tweaking  the  channel  model:  What  if  the  channel  was  not  a  single  AWGN  channel,  but  two 
AWGN  channels  in  parallel?  As  we  shall  see  when  we  discuss  multiple  antenna  systems  in 
Ghapter  8,  it  is  possible  to  use  multiple  antennas  at  the  transmitter  and  receiver  to  obtain  spatial 
degrees  of  freedom  in  addition  to  those  in  time  and  frequency.  If  there  are  indeed  two  spatial 
channels  that  are  created  using  multiple  antennas  and  we  can  model  each  of  them  as  AWGN, 
then  the  spectral  efficiency  per  channel  is  5/2  =  2.5  bps/Hz,  and  {Eh /Nq) required  >  ^  2  ~  1.86, 

or  2.7  dB.  Since  the  actual  Eh/No  is  6  dB,  the  system  is  operating  more  than  3  dB  away  from 
the  Shannon  limit.  Since  we  do  have  practical  channel  codes  that  get  to  within  a  dB  or  less  of 
Shannon  capacity,  the  claim  now  becomes  believable. 


Figure  7.7:  Binary  symmetric  channel  with  crossover  p. 


Binary  symmetric  channel:  We  now  turn  our  attention  to  the  BSG  with  crossover  probability 
p  shown  in  Figure  7.7,  which  might,  for  example,  be  induced  by  hard  decisions  on  an  AWGN 
channel.  Note  that  we  are  only  interested  in0<p<|.  lfp>|,  then  we  can  switch  zeros  and 
ones  at  the  output  of  the  channel  to  get  back  to  a  BSG  with  crossover  probability  p  =  1  —  p  <  ^. 
The  BSG  can  also  be  written  as  an  additive  noise  channel,  analogous  to  the  discrete  time  AWGN 
channel  (7.3): 

Yi  =  Xi®Ni,  i  =  l,...,n  (7.12) 

where  the  exclusive  or  (XOR)  symbol  ©  corresponds  to  addition  modulo  2,  which  follows  the 
rules: 

1©0=0©1=1  ^  ’ 

Thus,  we  can  flip  a  bit  by  adding  (modulo  2)  a  1  to  it.  The  probability  of  a  bit  flip  is  p.  Thus, 
the  noise  variables  W  are  i.i.d.  Bernoulli  random  variables  with  P[W  =  1]  =  p  =  1  —  P[Ni  =  0]. 

Just  as  with  the  AWGN  channel,  we  now  develop  a  sphere  packing  argument  to  provide  an 
intuitive  derivation  of  the  BSG  channel  capacity.  Of  course,  our  concept  of  distance  must  be 
different  from  the  Euclidean  distance  considered  for  the  AWGN  channel.  Define  the  Hamming 
distance  between  two  binary  vectors  of  equal  length  to  be  the  number  of  places  in  which  they 
differ.  For  a  codeword  of  length  n,  the  average  number  of  errors  equals  np.  Assuming  that 
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the  number  of  errors  clusters  around  this  average  for  large  n,  define  a  decoding  sphere  around  a 
codeword  as  all  sequences  which  are  at  Hamming  distance  of  np  or  less  from  it.  The  number  of 
such  sequences  is  called  the  volume  of  the  decoding  sphere.  By  virtue  of  (7.12)  and  (7.13),  we 
see  that  this  volume  is  exactly  equal  to  the  number  of  noise  vectors  N  =  {Ni, ...,  A^„)^  with  np 
or  fewer  ones  (the  number  of  ones  in  a  sequence  is  called  its  weight).  The  number  of  n- length 

vectors  with  weight  m  equals  ^  ^  ^  ’  hence  the  number  of  vectors  with  weight  at  most  np  is 

given  by 

np  ,  . 

7n=0  ^  ^ 

We  state  without  proof  the  following  asymptotic  approximation  for  Vn  for  large  n\ 

0<P<^  (7.15) 

where  Hb{-)  is  the  binary  entropy  function,  dehned  by 

77s(p)  = -plogaP- (1 -p)log2(l -p)  ,  0<p<l  (7.16) 

We  plot  the  binary  entropy  function  in  Figure  7.8(a).  Note  the  symmetry  around  p  =  ^-  This  is 
because,  as  mentioned  earlier,  we  can  map  p>|tol— p<iby  switching  the  roles  of  0  and  1 
at  the  output. 


(a)  Binary  entropy  function 


Figure  7.8:  The  binary  entropy  function  Hb{p)  and  the  capacity  of  a  BSC  with  crossover  prob¬ 
ability  p,  given  hy  1  —  Hb{j))- 


For  a  length  n  code  of  rate  R  bits/channel  use,  the  number  of  codewords  equals  2"^.  The  total 
number  of  binary  sequences  of  length  n,  or  the  entire  volume  of  the  space  we  are  working  in,  is 
2”^.  Thus,  if  we  wish  to  put  a  decoding  sphere  of  volume  W  around  each  codeword,  the  maximum 
number  of  codewords  we  can  fit  must  satisfy 

on  on 

“  K  ~  2^^b{p) 


which  gives 


R<1-  Hb{p) 
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It  can  be  rigorously  demonstrated  that  the  right-hand  side  actually  equals  the  capacity  of  the 
BSC.  We  therefore  state  this  result  formally. 

Capacity  of  BSC:  The  capacity  of  a  BSC  with  crossover  probability  p  is  given  by 

Cbsc{p)  =  1  ~  Hb{p)  bits/channel  use  (7-17) 

The  capacity  is  plotted  in  Figure  7.8(b).  We  note  the  following  points. 

•  For  p  =  |,  the  channel  is  useless  (its  output  does  not  depend  on  the  input)  and  has  capacity 
zero. 

•  For  p  >  we  switch  zeros  and  ones  at  the  output  to  obtain  an  effective  BSC  with  crossover 
probability  1  -  p,  hence  Cbsc{p)  =  Cbsc{'^  -p)- 

•  For  p  =  0, 1,  the  channel  is  perfect,  and  the  capacity  attains  its  maximum  value  of  1  bit/channel 
use. 

7.3.1  Design  Implications  of  Shannon  Limits 

Since  the  invention  of  turbo  codes  in  1993  and  the  subsequent  rediscovery  of  LDPC  codes,  we 
now  know  how  to  construct  random-looking  codes  that  can  be  efficiently  decoded  (typically  us¬ 
ing  iterative  or  message  passing  methods)  and  come  extremely  close  to  Shannon  limits.  Such 
“turbo-like”  coded  modulation  strategies  have  made,  or  are  making,  their  way  into  almost  every 
digital  communication  technology,  including  cellular,  WiFi,  digital  video  broadcast,  optical  com¬ 
munication,  magnetic  recording,  and  flash  memory.  Thus,  we  often  summarize  the  performance 
of  a  practical  coded  modulation  scheme  by  stating  how  far  away  it  is  from  the  Shannon  limit. 
Let  us  discuss  what  this  means  via  an  example.  A  rate  |  binary  code  is  employed  using  bit 
interleaved  coded  modulation  with  a  16QAM  alphabet.  We  are  told  that  it  operates  2  dB  away 
from  the  Shannon  limit  at  a  BER  of  10“®.  What  does  this  statement  mean? 

Since  we  can  convey  4  coded  bits  every  time  we  send  a  16QAM  symbol,  the  spectral  efficiency 
r  =  4  X  4  =  2  bps/Hz,  or  information  bits  per  symbol:  the  product  of  the  binary  code  rate 
(number  of  information  bits  per  coded  bit)  and  the  number  of  coded  bits  per  symbol  gives  the 
number  of  information  bits  per  symbol).  From  (7.11),  the  minimum  possible  Eb/No  is  found  to 
be  about  1.8  dB.  This  is  the  minimum  possible  Eb/No  at  which  Shannon  tells  us  that  error- free 
operation  (in  the  limit  of  large  code  blocklengths)  is  possible  at  the  given  spectral  efficiency. 
Of  course,  any  practical  strategy  at  hnite  blocklength,  no  matter  how  large,  will  not  give  us 
error-free  operation,  hence  we  declare  some  value  of  error  probability  that  we  are  satished  with, 
and  evaluate  the  Eb/No  for  which  that  error  probability  is  attained.  Hence  the  statement  that 
we  started  with  says  that  our  particular  coded  modulation  strategy  provides  BER  of  10“®  at 
Eb/No  =  1.8  -|-  2  =  3.8  dB  (2  dB  higher  than  the  Shannon  limit). 

How  much  gain  does  the  preceding  approach  provide  over  uncoded  communication?  Let  us 
compare  it  with  uncoded  QPSK,  which  has  the  same  spectral  efficiency  of  2  bps/Hz.  The  BER 
is  given  by  the  expression  Q{^y2Eb/No),  and  we  can  check  that  10“^  BER  is  attained  at  Eb/No 
of  9.6  dB.  Thus,  we  get  a  signihcant  coding  gain  of  9.6  —  3.8  =  5.8  dB  from  using  a  sophisticated 
coded  modulation  strategy,  hrst  expanding  the  constellation  from  QPSK  to  16QAM  so  there  is 
“room”  to  insert  redundancy,  and  then  using  a  powerful  binary  code. 

The  specihc  approach  for  applying  a  turbo-like  coded  modulation  strategy  depends  on  the  system 
at  hand.  For  systems  with  retransmissions  (e.g.,  wireless  data),  we  are  often  happy  with  block 
error  rates  of  1%  or  even  higher,  and  may  be  able  to  use  these  relatively  relaxed  specihcations  to 
focus  on  reducing  computational  complexity  and  coding  delay  (e.g.,  by  considering  simpler  codes 
and  smaller  block  lengths).  For  systems  where  there  is  no  scope  for  retransmissions  (e.g.,  storage 
or  broadcast),  we  may  use  longer  block  lengths,  and  may  even  layer  an  outer  code  to  clean  up  the 
residual  errors  from  an  inner  turbo-like  coded  modulation  scheme.  Another  common  feature  of 
many  systems  is  the  use  of  adaptive  coded  modulation,  in  which  the  spectral  efficiency  is  varied 
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as  a  function  of  the  channel  quality.  BICM  is  particularly  convenient  for  this  purpose,  since  it 
allows  us  to  mix  and  match  a  menu  of  well-optimized  binary  codes  at  different  rates  (e.g.,  ranging 
from  i  to  with  a  menu  of  standard  constellations  (e.g.,  QPSK,  8PSK,  16QAM,  64QAM)  to 
provide  a  large  number  of  options. 

A  detailed  description  of  turbo-like  codes  is  beyond  our  present  scope,  but  we  do  provide  a 
discussion  of  decoding  via  message  passing  after  a  basic  exposition  of  linear  codes. 


7.4  Introducing  linear  codes 

After  decades  of  struggling  to  construct  practical  coding  strategies  that  approach  Shannon’s 
performance  limits,  we  can  now  essentially  declare  victory,  with  channel  codes  of  reasonable 
block  length  coming  within  less  than  a  dB  of  capacity.  We  refer  the  reader  to  more  advanced 
texts  and  the  research  literature  for  details  regarding  such  capacity-achieving  codes,  and  limit 
ourselves  here  to  establishing  some  basic  terminology  and  concepts  which  provide  a  roadmap  for 
further  exploration.  We  restrict  attention  to  linear  codes,  which  are  by  far  the  most  prevalent 
class  of  codes  in  use  today,  and  suffice  to  approach  capacity.  As  we  discuss  shortly,  a  linear  code 
is  a  subspace  in  a  bigger  vector  space,  but  the  arithmetic  we  use  to  dehne  linearity  is  different 
from  the  real-  and  complex-valued  arithmetic  we  are  used  to. 

Finite  fields:  We  are  used  to  doing  calculations  with  real  and  complex  numbers.  The  real 
numbers,  together  with  the  rules  of  arithmetic,  comprise  the  real  held,  and  the  complex  numbers, 
together  with  the  rules  of  arithmetic,  comprise  the  complex  held.  Each  of  these  helds  has 
inhnitely  many  elements,  forming  a  continuum.  However,  the  basic  rules  of  arithmetic  (addition, 
multiplication,  division  by  nonzero  elements,  and  the  associative,  distributive  and  commutative 
laws)  can  also  be  applied  to  helds  with  a  hnite  number  of  discrete  elements.  Such  helds  are 
called  finite  fields,  or  Galois  fields  (after  the  French  mathematician  who  laid  the  foundations  for 
hnite  held  theory).  It  turns  out  that,  in  order  to  be  consistent  with  the  basic  rules  of  arithmetic, 
the  number  of  elements  in  a  hnite  held  must  be  a  power  of  a  prime,  and  we  denote  a  hnite  held 
with  elements,  where  p  is  a  prime,  and  m  a  positive  integer,  as  GF{p'^).  The  theory  of  hnite 
helds,  while  outside  our  scope  here,  is  essential  for  a  variety  of  algebraic  code  constructions,  and 
we  provide  some  pointers  for  further  study  later  in  this  chapter.  Our  own  discussion  here  is 
restricted  to  codes  over  the  binary  held  GF{2). 

Binary  arithmetic:  Binary  arithmetic  corresponds  to  operations  with  only  two  elements,  0 
and  1,  with  addition  modulo  2  as  specihed  in  (7.13).  Binary  subtraction  is  identical  to  binary 
addition.  Multiplication  and  division  (division  only  permitted  by  nonzero  elements)  are  trivial, 
since  the  only  nonzero  element  is  1.  The  usual  associative,  distributive  and  commutative  laws 
apply.  The  elements  {0, 1),  together  with  these  rules  of  binary  arithmetic,  are  said  to  comprise 
the  binary  held  GFifl). 

Linear  binary  code:  An  (n,  k)  binary  linear  code  C  consists  of  2^  possible  codewords,  each 
of  length  n,  such  that  adding  any  two  codewords  in  binary  arithmetic  yields  another  codeword. 
That  is,  C  is  closed  under  linear  combinations  (the  coefhcients  of  the  linear  combination  can  only 
take  values  0  or  1,  since  we  are  working  in  binary  arithmetic),  and  is  therefore  a  fc-dimensional 
subspace  of  the  n-dimensional  vector  space  of  all  n  length  binary  vectors,  in  a  manner  that  is 
entirely  analogous  to  the  concept  of  subspace  in  real-valued  vector  spaces.  Pursuing  this  analogy 
further,  we  can  specify  a  linear  code  C  completely  by  dehning  a  basis  with  k  vectors,  such  that 
any  vector  in  C  (i.e.,  any  codeword)  can  be  expressed  as  a  linear  combination  of  the  basis  vectors. 

Food  for  thought:  The  all-zero  codeword  is  always  part  of  any  linear  code.  Why? 

Notational  convention:  While  we  have  preferred  working  with  column  vectors  thus  far,  in  defer¬ 
ence  to  the  convention  in  most  coding  theory  texts,  we  express  codewords  as  row  vectors.  Letting 
u  and  V  denote  two  binary  vectors  of  the  same  length,  we  denote  by  u  ©  v  their  component  by 
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component  addition  over  the  binary  field.  For  example,  (00110)  ©  (10101)  =  (10011). 

Example  7.4.1  (Repetition  code)  An  (n,  1)  repetition  code  has  only  two  codewords,  the  all- 
one  codeword  xi  =  (1, ...,  1)  and  the  all-zero  codeword  xq  =  (0,  ...,0).  We  see  that  xi  ©  xi  = 
xq  ©  xq  =  Xq  and  that  xi  ©  xq  =  xq  ©  xi  =  Xi,  so  that  this  is  indeed  a  linear  code.  There  are 
only  2^  codewords,  so  that  the  dimension  k  =  1.  Thus,  the  code,  when  viewed  as  a  vector  space 
over  the  binary  field,  is  spanned  by  a  single  basis  vector,  xi.  While  the  encoding  operation  is 
trivial  (just  repeat  the  information  bit  n  times)  for  this  code,  let  us  write  it  in  a  manner  that 
leads  into  a  more  general  formalism.  For  example,  for  the  (5, 1)  repetition  code,  the  information 
bit  u  G  {0, 1}  is  mapped  to  codeword  x  as  follows: 

X  =  uG 

where 

G  =  (1  1  1  1  1)  (7.18) 

is  a  matrix  whose  rows  (just  one  row  in  this  case)  provide  a  basis  for  the  code. 


Example  7.4.2  (Single  parity  check  code)  An  (n,  n  —  1)  single  parity  check  code  takes  as 
input  n  —  1  unconstrained  information  bits  u  =  (ui, ...,  Un-i),  maps  them  unchanged  to  n  —  1  bits 
in  the  codeword,  and  adds  a  single  parity  check  bit  to  obtain  a  codeword  x  =  (xi, ...,  a:„). 

For  example,  we  can  set  the  first  n  —  1  code  bits  to  the  information  bits  {xi  =  Ui, ...,  Xn-i  =  Un-i) 
and  append  a  parity  check  bit  as  follows: 


Xn  =  Xi®  X2...  ©  Xn-l 


Here,  the  code  dimension  /c  =  u  —  1,  so  that  we  can  describe  the  code  using  n  —  1  linearly 
independent  basis  vectors.  For  example,  for  the  (5,4)  single  parity  check  code,  a  particular 
choice  of  basis  vectors,  put  as  rows  of  a  matrix  as  follows: 


/ 1 

0 

0 

0 

1  \ 

0 

1 

0 

0 

1 

0 

0 

1 

0 

1 

^0 

0 

0 

1 

1  / 

(7.19) 


We  can  now  check  that  any  codeword  can  be  written  as 


X  =  uG 


where  u  =  (wi,  ...,^4)  is  the  information  bit  sequence. 


Generator  matrix:  While  the  preceding  examples  are  very  simple,  they  provide  insight  into 
the  general  structure  of  linear  codes.  An  (n,  k)  linear  code  can  be  represented  by  a  basis  with 
k  linearly  independent  vectors,  each  of  length  n.  Putting  these  k  basis  vectors  as  the  rows  of  a 
k  X  n  matrix  G,  we  can  then  define  a  mapping  from  k  information  bits,  represented  as  a  1  x  /c 
row  vector  u,  to  n  code  bits,  represented  as  a  1  x  n  row  vector  x,  as  follows: 

X  =  uG  (7.20) 

The  matrix  G  is  called  the  generator  matrix  for  the  code,  since  it  can  be  used  to  generate  all  2^ 
codewords  by  cycling  through  all  possible  values  of  the  information  vector  u. 

Dual  codes:  Drawing  again  on  our  experience  with  real-valued  vector  spaces,  we  know  that, 
for  any  fc-dimensional  subspace  C  in  an  n-dimensional  vector  space,  we  can  find  an  orthogonal 
n  —  k  dimensional  subspace  such  that  every  vector  in  C  is  orthogonal  to  every  vector  in  C^. 
The  subspace  is  itself  an  (n,  n  —  k)  code,  and  C  and  are  said  to  be  duals  of  each  other. 
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Example  7.4.3  (Duality  of  repetition  and  single  parity  check  codes)  It  can  be  checked 
that  the  (5,4)  single  parity  check  code  and  (5, 1)  repetition  codes  are  dnals  of  each  other.  That 
is,  each  codeword  in  the  (5, 4)  code  is  orthogonal  to  each  codeword  in  the  (5, 1)  code.  Since 
codewords  are  linear  combinations  of  rows  of  the  generator  matrix,  it  suffices  to  check  that  each 
row  of  a  generator  matrix  for  the  (5, 4)  code  is  orthogonal  to  each  row  of  a  generator  matrix  for 
the  (5, 1)  code.  Specihcally, 


G(5,i)G^4)  —  (mil) 


/  1  0  0  0  \ 
0  10  0 
0  0  10 
0  0  0  1 
VI  1  1  1/ 


0 


Parity  check  matrix:  The  preceding  discussion  shows  that  we  can  describe  an  (n,  k)  linear 
code  C  by  specifying  its  dual  code  C^.  In  particular,  a  generator  matrix  for  the  dual  code  serves 
as  a  parity  check  matrix  H  for  C,  in  the  sense  that  an  n-dimensional  binary  vector  x  lies  in  C  if 
and  only  if  it  is  orthogonal  to  each  row  of  H.  That  is, 

Hx^  =  0  if  and  only  if  x  G  C  (7.21) 


Each  row  of  the  parity  check  matrix  dehnes  a  parity  check  equation.  Thus,  for  a  parity  check 
matrix  H  of  dimension  {n  —  k)  x  n,  each  codeword  must  satisfy  n  —  k  parity  check  equations. 
Equivalently,  if  G  is  a  generator  matrix  for  C,  then  it  must  satisfy 

HG^  =  0  (7.22) 


In  our  examples,  the  generator  matrix  for  the  (5, 1)  repetition  code  is  a  parity  check  matrix  for 
the  (5,4)  code,  and  vice  versa. 


For  an  (n,  k)  code  with  large  n  and  k,  it  is  clearly  difficult  to  check  by  brute  force  search 
enumeration  over  2^  codewords  whether  a  particular  n-dimensional  vector  y  is  a  valid  codeword. 
However,  for  a  linear  code,  it  becomes  straightforward  to  verify  this  using  only  n  —  k  parity  check 
equations,  as  in  (7.21).  These  parity  check  equations,  which  provide  the  redundancy  required 
to  overcome  channel  errors,  are  important  not  only  for  verihcation  of  correct  termination  of 
decoding,  but  also  play  a  crucial  role  during  the  decoding  process,  as  we  illustrate  shortly. 


Non-uniqueness:  An  (n,  k)  linear  code  C  is  a  unique  subspace  consisting  of  a  set  of  2^  code¬ 
words,  and  its  dual  {n,n  —  k)  code  is  a  unique  subspace  comprising  2"'“^  codewords.  However, 
in  general,  neither  the  generator  nor  the  parity  check  matrix  for  a  code  are  unique,  since  the 
choice  of  basis  for  a  nontrivial  subspace  is  not  unique.  Thus,  while  the  generator  matrix  for 
the  (5, 1)  code  is  unique  because  of  its  trivial  nature  (one  dimension,  binary  held),  the  generator 
matrix  for  the  (5, 4)  code  is  not.  For  example,  by  taking  linear  combinations  of  the  rows  in  (7.19), 
we  obtain  another  linearly  independent  basis  that  provides  an  alternative  generator  matrix  for 
the  (5,4)  code: 
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1/ 

(7.23) 


From  (7.20),  we  see  that  different  choices  of  generator  matrices  correspond  to  different  ways  of 
encoding  a  /c- dimensional  information  vector  u  into  an  n-dimensional  codeword  xeC. 


Systematic  encoding:  A  systematic  encoding  is  one  in  which  the  information  vector  u  appears 
directly  in  x  (without  loss  of  generality,  we  can  take  the  bits  of  u  to  be  the  hrst  k  bits  in  x),  so 
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that  there  is  a  clear  separation  between  “information  bits”  and  “parity  check”  bits.  In  this  case, 
the  generator  matrix  can  be  written  as 


G  =  [Ifc|P]  systematic  encoding  (7.24) 

where  1^  denotes  the  kxk  identity  matrix,  and  P  is  a  A;  x  (n  —  fc)  matrix  specifying  how  the  n  —  k 
parity  bits  depend  on  the  input.  The  identity  matrix  ensures  that  the  k  rows  of  G  are  linearly 
independent,  so  this  does  represent  a  valid  generator  matrix  for  an  (n,  k)  code.  The  ith  row  of 
the  generator  matrix  (7.24)  corresponds  to  an  information  vector  u  =  {ui,...,Uk)  with  Ui  =  1 
and  Uj  =  0,  j  ^  i.  Note  that,  even  when  we  restrict  the  encoding  to  be  systematic,  the  generator 
matrix  is  not  unique  in  general.  The  generator  matrices  (7.18)  and  (7.19)  for  the  (5, 1)  and  (5,4) 
codes  correspond  to  systematic  encoding.  The  encoding  of  the  (5, 4)  code  corresponding  to  the 
generator  matrix  in  (7.23)  is  not  systematic. 

Reading  off  a  parity  check  matrix  from  a  systematic  generator  matrix:  If  we  are  given 
a  systematic  encoding  of  the  form  (7.24),  we  can  easily  read  off  a  parity  check  matrix  as  follows: 

H  =  [-P^|R_fe]  (7.25) 

where  the  negative  sign  can  be  dropped  for  the  binary  held.  The  identity  matrix  ensures  that 
n  —  k  rows  of  H  are  linearly  independent,  hence  this  is  a  valid  parity  check  matrix  for  an  (n,  k) 
linear  code.  We  leave  it  as  an  exercise  to  verify,  by  directly  substituting  from  (7.24)  and  (7.25), 
that  HG^  =  0. 


Example  7.4.4  (rnnning  example:  a  (5,  2)  linear  code)  Let  us  now  construct  a  somewhat 
less  trivial  linear  code  which  will  serve  as  a  running  example  for  illustrating  some  basic  concepts. 
Suppose  that  we  have  k  =  2  information  bits  Mi,M2  ^  {0, 1}  that  we  wish  to  protect.  We  map 
this  (using  a  systematic  encoding)  to  a  codeword  of  length  5  using  a  combination  of  repetition 
and  parity  check,  as  follows: 

X=  {ui,U2,Ui,U2,Ui®U2)  (7.26) 

A  systematic  generator  matrix  for  this  (5,  2)  code  can  be  constructed  by  considering  the  two 
codewords  corresponding  to  u  =  (1,  0)  and  u  =  (0, 1),  respectively,  which  gives: 

/  1  0  1  0 

V  0  1  0  1 


(7.27) 


We  can  read  off  a  parity  check  matrix  using  (7.24)  and  (7.25)  to  obtain: 

/  1  0  1  0  0  \ 

H=  0  1  0  1  0  (7.28) 

\  1  1  0  0  1  / 

Any  codeword  x  =  (xi,  ...,0:5)  must  satisfy  Hx^  =  0,  which  corresponds  to  the  following  parity 
check  equations: 

©  X3  =  0 
X2  ©  X4  =  0 

©  0:2  ©  X3  =  0 


Suppose,  now,  that  we  transmit  the  (5,  2)  code  that  we  have  just  constructed  over  a  BSC  with 
crossover  probability  p.  That  is,  we  send  a  codeword  x  =  (xi,  ...,0:5)  using  the  channel  n  =  5 
times.  According  to  our  discrete  memoryless  channel  model,  errors  occur  independently  for  each 
of  the  code  symbols,  and  we  get  the  output  y  =  {yi,  ...,1/5),  where  P[yi\xi\  =  Xi  with  probability 
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1  —  p,  and  P[yi\xi\  =  Xj  ©  1  (i.e,  the  bit  is  flipped)  with  probability  p.  How  should  we  try  to 
decode  (i.e.,  estimate  which  codeword  x  was  sent  from  the  noisy  output  y)?  And  how  do  we 
evaluate  the  performance  of  our  decoding  rule?  In  order  to  relate  these  to  the  structure  of  the 
code,  it  is  useful  to  reiterate  the  notion  of  Hamming  distance,  and  to  introduce  the  concept  of 
Hamming  weight. 

Hamming  distance:  The  Hamming  distance  v)  between  two  binary  vectors  u  and  v  of 

equal  length  is  the  number  of  places  in  which  they  differ. 

For  example,  the  Hamming  distance  between  the  two  rows  of  the  generator  matrix  G  in  (7.27) 
is  given  by  dnigi,  g2)  =  4. 

Hamming  weight:  The  Hamming  weight  ta//(u)  of  a  binary  vector  u  equals  the  number  of 
ones  it  contains. 

For  example,  the  Hamming  weight  of  each  row  of  the  generator  matrix  G  in  (7.27)  is  3. 

The  Hamming  distance  between  two  vectors  u  and  v  is  the  Hamming  weight  of  their  binary  sum: 

d//(u,  v)  =  m;h(u©  v)  (7.29) 

Structure  of  an  (n,  k)  linear  code:  Consider  a  specific  codeword  xq  in  a  linear  code  C, 
and  consider  its  Hamming  distance  from  another  codeword  xgC.  We  know  that  dni'^o,^)  = 
wni^o  ©  x).  By  linearity,  x  =  xq  ©  x  is  also  a  codeword  in  C,  and  distinct  choices  of  x  give 
distinct  codewords  x.  Thus,  as  we  run  through  all  possible  codewords  x  G  C,  we  obtain  all 
possible  codewords  xgC  (including  x  =  0  for  x  =  xq.  Thus,  dni^o,^)  =  wh(x),  so  that  the 
set  of  Hamming  distances  between  xq  and  all  codewords  in  C  (running  through  all  2^  choices  of 
x)  is  precisely  the  set  of  weights  that  the  codewords  in  C  have  (corresponding  to  the  2^  distinct 
vectors  x,  one  for  each  x). 

Minimum  distance:  The  minimum  distance  of  a  code  is  defined  as 

drain  minxi,x2  e  c,xi^x2C^/^(xi,  Xa) 

Applying  (7.29),  and  noting  that,  for  a  linear  code,  xi  ©  xa  is  a  nonzero  codeword  in  C,  we  see 
that  the  minimum  distance  equals  the  minimum  weight  among  all  nonzero  codewords.  That  is, 

dmin  =  Wmin  =  miu^  £  (x)  ,  for  a  linear  code  (7.30) 

The  (5,2)  code  is  small  enough  that  we  can  simply  list  all  four  codewords:  00000,  10101,  01011, 
and  11110,  from  which  we  see  that  Wmin  =  dmin  =  3. 

Guarantees  on  error  correction:  A  code  is  guaranteed  to  correct  t  errors  if 

2t  +  1  <  dmin  (7.31) 

It  is  quite  easy  to  see  why.  We  can  set  up  decoding  spheres  around  any  codeword  of  radius  f;  for 
a  codeword  x,  this  is  all  vectors  y  within  Hamming  distance  t: 

A(x)  =  {y  :  dH(y,x)  <  t} 

The  condition  (7.31)  guarantees  that  these  decoding  spheres  do  not  overlap.  Thus,  if  we  make  at 
most  t  errors,  we  are  guaranteed  that  the  received  vector  falls  into  the  unique  decoding  sphere 
corresponding  to  the  transmitted  codeword. 

Erasures:  There  are  some  scenarios  for  which  it  is  useful  to  introduce  the  concept  of  erasures, 
which  correspond  to  assigning  a  “don’t  know”  to  a  symbol  rather  than  making  a  hard  decision. 
Using  a  similar  argument  as  before,  we  can  state  that  a  code  is  guaranteed  to  correct  t  errors 
and  e  erasures  if 

2f  +  e  +  1  <  dmin  (7.32) 
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Since  it  is  “twice  as  easy”  to  correct  erasures  than  to  correct  errors,  we  may  choose  to  design 
a  demodulator  to  put  out  erasures  in  regions  where  we  are  uncertain  about  our  hard  decision. 
For  a  binary  channel,  this  means  that  our  input  alphabet  is  {0, 1}  but  our  output  alphabet  is 
{0,  l,e},  where  e  denotes  erasure.  As  we  see  in  Section  7.5,  we  can  go  further  down  this  path, 
with  the  decoder  using  soft  decisions  which  take  values  in  a  real- valued  output  alphabet. 

Running  example:  Our  (5,  2)  code  has  dmin  =  3,  and  hence  can  correct  1  error  or  2  erasures  (but 
not  both).  Let  us  see  how  we  would  structure  brute  force  decoding  of  a  single  error,  by  writing 
down  which  vectors  fall  within  decoding  spheres  of  unit  radius  around  each  codeword,  and  also 
pointing  out  which  vectors  are  left  over.  This  is  done  by  writing  all  2^  possible  binary  vectors  in 
what  is  termed  a  standard  array. 


00000 

10101 

01011 

11110 

10000 

00101 

11011 

OHIO 

01000 

11101 

00011 

10110 

00100 

10001 

01111 

11010 

00010 

10111 

01001 

11100 

00001 

10100 

01010 

11111 

11000 

01101 

10011 

00110 

01100 

11001 

00111 

10010 

Table  7.1:  Standard  array  for  the  (5,2)  code 

Let  us  take  advantage  of  this  example  to  describe  the  general  structure  of  a  standard  array 
for  an  (n,  k)  linear  code.  The  array  has  2"“^  rows  and  2^  columns,  and  contains  all  possible 
binary  vectors  of  length  n.  The  hrst  row  of  the  array  consists  of  the  2^  codewords,  starting  with 
the  all-zero  codeword.  The  hrst  column  consists  of  error  patterns  ordered  by  weight  (ties  broken 
arbitrarily),  starting  with  no  errors  in  the  hrst  row,  ei  =  0.  In  general,  denoting  the  hrst  element 
of  the  Ah  row  as  the  error  pattern  e*,  the  jth  element  in  the  Ah  row  is  a*  ^  =  Cj  -|-  Xj,  where 
denotes  the  jth  codeword,  j  =  1, ...,  2^.  That  is,  the  (i,  j)th  element  in  the  standard  array  is  the 
jth  codeword  translated  by  the  ith  error  pattern.  For  the  standard  array  in  Table  7.1  for  (5,2) 
code,  the  hrst  row  consists  of  the  four  codewords.  We  demarcate  it  from  all  the  other  entries  in 
the  table,  which  are  not  codewords,  by  a  horizontal  line.  The  next  hve  rows  correspond  to  the 
hve  possible  one-bit  error  patterns,  which  we  know  can  be  corrected.  Thus,  for  the  jth  column, 
the  hrst  six  rows  correspond  to  the  decoding  sphere  of  Hamming  radius  one  around  codeword  Xj. 
We  demarcate  this  by  drawing  a  double  line  under  the  sixth  row.  Beyond  these,  the  hrst  entries 
of  the  remaining  row  are  arbitrarily  set  to  be  minimum  weight  binary  vectors  that  have  not 
appeared  yet.  We  cannot  guarantee  that  we  can  correct  these  error  patterns.  For  example,  the 
hrst  and  fourth  entries  in  rows  7  and  8  are  both  equidistant  from  the  hrst  and  fourth  codewords, 
hence  neither  of  these  patterns  can  be  mapped  unambiguously  to  a  decoding  sphere. 

Bounded  distance  decoding:  For  a  code  capable  of  correcting  at  least  t  errors,  bounded  distance 
decoding  at  radius  t  corresponds  to  the  following  rule:  decode  a  received  word  to  the  nearest 
codeword  (in  terms  of  Hamming  distance),  as  long  as  the  distance  is  at  most  t,  and  declare  decod¬ 
ing  failure  if  there  is  no  such  codeword.  A  conceptually  simple,  but  computationally  inefhcient, 
way  to  think  about  this  is  in  terms  of  the  standard  array.  For  our  running  example  in  Table  7.1, 
bounded  distance  decoding  with  t  =  1  could  be  implemented  by  checking  if  the  received  word  is 
anywhere  in  the  hrst  six  rows,  and  if  so,  decode  it  to  the  hrst  element  of  the  column  it  falls  in. 
For  example,  the  received  word  10001  is  in  the  fourth  row  and  second  column,  and  is  therefore 
decoded  to  the  second  codeword  10101.  If  the  received  word  is  not  in  the  hrst  six  columns, 
then  we  declare  decoding  failure.  For  example,  the  received  word  01101  is  in  the  seventh  row 
and  hence  does  not  fall  in  the  decoding  sphere  of  radius  one  for  any  codeword,  hence  we  would 
declare  decoding  failure  if  we  received  it. 
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Each  row  of  the  standard  array  is  the  translation  of  the  code  C  by  its  hrst  entry,  e^,  and  is  called 
a  coset  of  the  code.  The  hrst  entry  is  called  the  coset  leader  for  the  ith  coset,  i  =  1, ...,  2”“^. 
We  now  note  that  a  coset  can  be  described  far  more  economically  than  by  listing  all  its  elements. 
Applying  a  parity  check  matrix  to  the  jth  element  of  the  ith  coset,  H(xj  0  =  Hef,  we 

get  an  answer  that  depends  only  on  the  coset  leader,  since  Hx^  =  0  for  any  codeword  x.  We 
therefore  dehne  the  syndrome  for  the  ith  coset  as  Sj  =  He^.  The  syndrome  is  a  binary  vector  of 
length  n  —  k,  and  takes  2"'“^  possible  values.  The  coset  leaders  and  syndromes  corresponding  to 
Table  7.1,  using  the  parity  check  matrix  (7.25),  are  listed  in  Table  7.2. 


Coset  leader 
00000 

Syndrome 

000 

10000 

101 

01000 

Oil 

00100 

100 

00010 

010 

00001 

001 

11000 

01100 

110 

111 

Table  7.2:  Mapping  between  coset  leaders  and  syndromes  for  the  (5,  2)  code  using  (7.25) 

Bounded  distance  decoding  using  syndromes:  Consider  a  received  word  y.  Compute  the  syndrome 
s  =  Hy^.  If  the  syndrome  corresponds  to  a  coset  leader  e  that  is  within  the  decoding  sphere  of 
interest,  then  we  estimate  the  transmitted  codeword  as  x  =  y  +  e.  Consider  again  the  received 
word  y  =  10001  and  compute  its  syndrome  s  =  Hy^  =  100.  This  corresponds  to  the  fourth 
row  in  Table  7.2,  which  we  know  is  within  a  decoding  sphere  of  radius  one.  The  coset  leader  is 
e  =  00100.  Adding  this  to  the  received  word,  we  obtain  x  =  y  0  e  =  10101,  which  is  the  same 
result  that  we  obtained  by  direct  look-up  in  the  standard  array. 

Performance  of  hounded  distance  decoding:  Correct  decoding  occurs  if  the  received  word  is 
mapped  to  the  transmitted  word.  For  bounded  distance  decoding  with  t  =  1  for  the  (5,  2) 
code,  this  happens  if  and  only  if  there  is  at  most  one  error.  Thus,  when  a  codeword  for  the  (5,  2) 
code  is  sent  over  a  BSC  with  crossover  probability  p,  the  probability  of  correct  decoding  is  given 
by 

Fc  =  (1  -p)®  +  ^  ^  p(l  -p)^ 

If  the  decoding  is  not  correct,  let  us  term  the  event  incorrect  decoding.  One  of  two  things 
happen  when  the  decoding  is  incorrect:  the  received  word  falls  outside  the  decoding  sphere  of  all 
codewords,  hence  we  declare  decoding  failure,  or  the  received  word  falls  inside  the  decoding  sphere 
of  one  of  the  incorrect  codewords,  and  we  have  an  undetected  error.  The  sum  of  the  probabilities 
of  these  two  events  is  Fg  =  1  —  Pc-  Since  decoding  failure  (where  we  know  something  has  gone 
wrong)  is  less  damaging  than  decoding  error  (where  we  do  not  realize  that  we  have  made  errors), 
we  would  like  its  probability  P^f  to  be  much  larger  than  the  probability  F^g  of  undetected  error. 
For  large  block  lengths  n,  we  can  typically  design  codes  for  which  this  is  possible,  hence  we 
often  take  Fg  as  a  proxy  for  decoding  failure.  For  our  simple  running  example,  we  compute  the 
probabilities  of  decoding  failure  and  decoding  error  in  Problem  7.13.  Exact  computations  of  Pdf 
and  Pue  are  difficult  for  more  complex  codes,  hence  we  typically  resort  to  bounds  and  simulations. 

Even  when  we  use  syndromes  to  infer  coset  leaders  rather  than  searching  the  entire  standard 
array,  look-up  based  approaches  to  decoding  do  not  scale  well  as  we  increase  the  code  block 
length  n  and  the  decoding  radius.  A  signihcant  achievement  of  classical  coding  theory  has 
been  to  construct  codes  whose  algebraic  structure  can  be  exploited  to  devise  efficient  means 
of  mapping  syndromes  to  coset  leaders  for  bounded  distance  decoding  (such  methods  typically 
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involve  finding  roots  of  polynomials  over  finite  fields).  However,  much  of  the  recent  progress 
in  coding  has  resulted  from  the  development  of  iterative  decoding  algorithms  based  on  message 
passing  architectures,  which  permit  efficient  decoding  of  very  long,  random-looking  codes  which 
can  approach  Shannon  limits.  We  now  provide  a  simple  illustration  of  message  passing  via  our 
running  example  of  the  (5,  2)  code. 


^2 

^3 


X4 


^5 


Cl 


C2 


C3 


Figure  7.9:  Tanner  graph  for  (5,2)  code  with  parity  check  matrix  given  by  (7.28). 


Tanner  graph:  A  binary  linear  code  with  parity  check  matrix  H  can  be  represented  as  a  Tanner 
graph,  with  variable  nodes  representing  the  coded  bits,  and  check  nodes  representing  the  parity 
check  equations.  A  variable  node  is  connected  to  a  parity  check  node  by  an  edge  if  it  appears 
in  that  parity  check  equation.  A  Tanner  graph  for  our  running  example  (5,  2)  code,  based  on 
the  parity  check  matrix  (7.28),  is  shown  in  Figure  7.9.  Check  node  Ci  corresponds  to  the  parity 
check  equation  specihed  by  the  hrst  row  of  (7.28),  xi  ©  X3  =  0,  and  is  therefore  connected  to  xi 
and  X3.  Check  node  C2  corresponds  to  the  second  row,  X2®Xi  =  0,  and  is  therefore  connected  to 
X2  and  Xi-  Check  node  C3  corresponds  to  the  third  row,  Xi®X2®x^  =  0,  and  is  connected  to  Xi, 
X2,  and  x^.  The  degree  of  a  node  is  dehned  to  be  the  number  of  edges  incident  on  it.  The  variable 
nodes  Xi, ...,  X5  have  degrees  2,  2,  1,  1,  and  1,  respectively.  The  check  nodes  Ci,  C2,  C3  have  degrees 
2,  2  and  3,  respectively.  The  success  of  message  passing  on  Tanner  graphs  is  sensitive  to  these 
degrees,  as  we  shall  see  shortly. 


b©c 


Figure  7.10:  Incoming  and  outgoing  messages  for  a  check  node. 


Bit  flipping  based  decoding:  Let  us  now  consider  the  following  simple  message  passing 
algorithm,  illustrated  via  the  example  in  Figure  7.11.  As  shown  in  the  example,  each  variable 
node  maintains  an  estimate  of  the  associated  bit,  initialized  by  what  was  received  from  the 
channel.  In  the  particular  example  we  consider,  the  received  sequence  is  10000.  We  know  from 
Table  7.1  that  a  bounded  distance  decoder  would  map  this  to  the  codeword  00000.  In  message 
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passing  for  bit  flipping,  each  variable  node  sends  out  its  current  bit  estimate  on  all  outgoing 
edges.  Each  check  node  uses  these  incoming  messages  to  generate  new  messages  back  to  the 
variable  nodes,  as  illustrated  in  Figure  7.10,  which  shows  a  check  node  of  degree  3.  That  is,  the 
message  sent  back  to  a  variable  node  is  the  value  that  bit  should  take  in  order  to  satisfy  that 
particular  parity  check,  assuming  that  the  messages  coming  in  from  the  other  variable  nodes 
are  correct.  When  the  variable  nodes  get  these  messages,  they  flip  their  bits  if  ’’enough”  check 
node  messages  tell  them  to.  In  our  example  of  a  (5,  2)  code,  let  us  employ  the  following  rule:  a 
variable  node  flips  its  channel  bit  if  (a)  all  the  check  messages  coming  into  it  tell  it  to,  and  (b) 
the  number  of  check  messages  is  more  than  one  (so  as  to  provide  enough  evidence  to  override 
the  current  estimate). 

Figure  7.11  shows  how  bit  flipping  can  be  used  to  correct  the  one-bit  error  pattern  10000.  Both 
check  node  messages  to  variable  node  xi  say  that  it  should  take  value  0,  and  cause  it  to  flip 
to  the  correct  value.  On  the  other  hand.  Figure  7.12  shows  that  bit  flipping  gets  stuck  for  the 
one  bit  error  pattern  00001,  because  there  is  only  one  check  message  coming  into  variable  node 
X5,  which  is  not  enough  to  flip  it.  Note  that  both  of  these  error  patterns  are  correctable  using 
bounded  distance  decoding,  using  Table  7.1  or  Table  7.2.  This  reveals  an  important  property  of 
iterative  decoding  on  Tanner  graphs:  its  success  depends  critically  on  the  node  degrees,  which  of 
course  depend  on  the  particular  choice  of  parity  check  matrix  used  to  specify  the  Tanner  graph. 


Figure  7.11:  Bit  flipping  based  decoding  for  the  (5,2)  code  is  successful  for  this  error  pattern. 


Can  we  £x  the  problem  revealed  by  the  example  in  Figure  7.12?  Perhaps  we  can  choose  a  different 
parity  check  matrix  for  which  the  Tanner  graph  has  variable  nodes  of  degree  at  least  2,  so  that 
bit  flipping  has  a  chance  of  working?  For  codes  over  large  block  lengths,  it  is  actually  possible 
to  use  a  randomized  approach  for  the  design  of  parity  check  matrices  yielding  desirable  degree 
distributions,  enabling  spectacular  performance  approaching  Shannon  limits.  In  these  regimes, 
iterative  decoding  goes  well  beyond  the  error  correction  capability  guarantees  associated  with 
the  code’s  minimum  distance.  However,  such  results  do  not  apply  to  the  simple  example  we  are 
considering,  where  iterative  decoding  is  having  trouble  decoding  even  up  to  the  guarantee  of  t  =  1 
associated  with  a  minimum  distance  dmin  =  3.  However,  this  gives  us  the  opportunity  to  present 
a  trick  that  can  be  useful  even  for  large  block  lengths:  use  redundant  parity  check  nodes,  adding 
one  or  more  rows  to  the  parity  check  matrix  that  are  linearly  dependent  on  other  rows.  Figure 
7.13  shows  a  Tanner  graph  for  the  (5,  2)  code  with  a  redundant  check  node  C4  corresponding  to 
X3  ©  0:4  ©  Xs  =  0.  That  is,  we  have  added  a  fourth  row  00111  to  the  parity  check  matrix  (7.25). 
This  row  is  actually  a  sum  of  the  hrst  three  rows,  and  hence  would  add  no  further  information 
if  we  were  just  performing  look-up  based  bounded  distance  decoding.  However,  revisiting  the 
troublesome  error  pattern  00001,  we  see  that  this  redundant  check  makes  all  the  difference  in 
the  performance  of  bit  flipping  based  decoding;  as  Figure  7.14  shows,  the  pattern  can  now  be 
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0 


0 


Figure  7.12:  Bit  flipping  based  decoding  for  the  (5,  2)  code  is  unsuccessful  for  this  error  pattern, 
even  though  it  is  correctable  using  bounded  distance  decoding. 


corrected. 


Cl 


(redundant  parity  check) 

Figure  7.13:  Tanner  graph  for  (5,2)  code  with  one  redundant  parity  check. 


7.5  Soft  decisions  and  belief  propagation 

We  have  discussed  decoding  of  linear  block  codes  based  on  hard  decision  inputs,  where  the  input 
to  the  decoder  is  a  string  of  bits.  However,  these  bits  are  sent  over  a  channel  using  modulation 
techniques  such  as  those  discussed  in  Chapter  4,  and  as  discussed  in  Chapter  6,  it  is  possible  to 
extract  soft  decisions  that  capture  more  of  the  information  we  have  about  the  physical  channel. 
In  this  section,  we  discuss  how  soft  decisions  can  be  used  in  iterative  decoding,  illustrating  the 
key  concepts  using  our  running  example  (5,  2)  code.  We  restrict  attention  to  BPSK  modulation 
over  a  discrete-time  AWGN  channel,  but  as  the  discussion  in  Chapter  6  indicates,  the  concept 
of  soft  decisions  is  applicable  to  any  signaling  scheme. 

A  codeword  x  =  (a;[l], ...,  x[n])  with  elements  taking  values  in  {0, 1}  can  be  mapped  to  a  sequence 
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These  bits  remain 


Figure  7.14:  Bit  flipping  based  decoding  for  the  (5,2)  code  using  a  redundant  parity  check  is 
now  successful  for  the  00001  error  pattern. 


of  BPSK  symbols  using  the  transformation 


b[m]  =  m  =  l,...,n  (7.33) 

The  advantage  of  this  map  is  that  it  transforms  binary  addition  into  real-valued  multiplication. 
That  is,  x[mi]  ©  x[m2]  maps  to  b[mi]b[m2].  The  BPSK  symbols  are  transmitted  over  a  discrete 
time  AWGN  channel,  with  received  symbols  given  by 

y[m]  =  Ab[m]  +  w[m]  =  A(— 1)^^™''  +  w[m]  ,  m  =  1, ...,  n  (7.34) 


where  the  amplitude  A  =  where  Eg  denotes  the  energy /symbol,  and  w[m]  ~  A^(0,  a^) 

are  i.i.d.  discrete  time  WGN  samples.  To  simplify  notation,  consider  a  single  bit  a;G{0,l}, 
mapped  to  6  G  {—1,  +1},  with  received  sample  y  =  Ab  +  w,  w  ^  a^).  Gonsider  the  posterior 

probabilities  P[x  =  0||/]  and  P[x  =  1||/].  Since  P[x  =  0\y]  +  P[x  =  1||/]  =  1,  we  can  convey 
information  regarding  these  probabilities  in  a  number  of  ways.  One  particularly  convenient 
format  is  the  log  likelihood  ratio  (LLR),  dehned  as 


L{x)  =  log 


P[x 

P\x 


1] 


log 


P[b  =  +1] 
P[b  =  -1] 


(7.35) 


where  we  omit  the  conditioning  on  y  to  simplify  notation.  We  can  go  from  LLRs  to  bit  proba¬ 
bilities  as  follows: 


P[x  =  0]  = 


epx)  ’ 

We  can  go  from  LLRs  to  hard  decisions  as  follows: 


P[x  =  1]  = 


gL(x) 


(7.36) 


b  =  sign(L)  ,  X  =  I{l<o}  =  I{b<o}  (7-37) 

where  b  G  {  —  1,  +1}  is  the  “BPSK”  version  of  a:  G  {0, 1}. 

Suppose  that  the  prior  probability  of  bit  x  taking  value  0  is  7ro(a:).  This  notation  implies  that  the 
prior  probability  could  vary  across  bits:  while  we  do  not  need  this  for  the  examples  considered 
here,  allowing  this  level  of  generality  is  useful  for  some  decoder  structures,  such  as  for  turbo 
codes,  the  information  about  bit  x  supplied  by  a  given  decoder  component  may  be  interpreted 
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as  its  prior  probability  by  another  decoder  component.  We  can  now  apply  Bayes’  rule  to  show 
(see  Problem  7.17)  that  the  LLR  decomposes  as  follows: 


L{ 


X 


Lpriori^^') 


Lchanneli^) 


(7.38) 


where 


and 


Lpriorip^')  lo§  ^ 


Lchanneli^^^ 


7ro(x) 

-  7ro(x) 

2Ay 


(7.39) 

(7.40) 


Thus,  the  use  of  the  logarithm  enables  an  additive  decomposition  of  information  from  independent 
sources,  which  is  both  intuitively  pleasing  and  computationally  useful.  For  our  present  purpose, 
we  can  assume  that  x  takes  values  from  {0, 1}  with  equal  probability,  so  that  Lprior  =  0.  The 
LLR  L{x)  represents  the  strength  of  our  belief  in  whether  the  bit  is  0  or  1,  and  LLR-based 
message  passing  for  iterative  decoding  is  referred  to  as  belief  propagation. 


Incoming  messages 


=  Uj  +  u  2+  L  c 

=  Uj  +  U2  +  Lc 


Vi  =  U2  +  U3  +  Lc 

Outgoing  messages 


Figure  7.15:  Variable  node  update. 


Belief  propagation:  We  describe  belief  propagation  over  a  Tanner  graph  for  a  linear  block  code 
by  specifying  message  generation  at  a  generic  variable  node  and  a  generic  check  node.  In  belief 
propagation,  the  message  going  out  on  an  edge  is  a  function  of  the  messages  coming  in  on  all  of 
the  other  edges.  At  a  variable  node,  all  of  the  LLRs  involved  refer  to  a  given  bit,  with  information 
coming  in  from  the  channel  and  from  check  nodes.  A  key  approximation  in  belief  propagation 
is  to  approximate  all  of  these  as  independent  sources  of  information,  so  that  the  corresponding 
LLRs  add  up;  this  an  excellent  approximation  for  large  block  lengths  that  may  not  really  apply 
to  our  small  running  example,  but  we  will  go  ahead  and  use  it  anyway  in  our  numerical  examples. 
Figure  7.15  shows  generation  of  an  outgoing  message  from  a  variable  node:  the  outgoing  message 
on  an  edge  is  the  sum  of  the  incoming  message  from  all  other  edges  (including  from  the  channel 
as  well  as  from  the  check  nodes).  Thus,  the  outgoing  message  on  a  given  edge  equals  the  sum 
of  all  incoming  messages,  minus  the  incoming  message  on  that  edge,  and  this  is  the  way  we 
implement  it  in  the  code  fragment  below.  For  simplicity,  a  node  of  degree  three  (not  counting 
the  edge  coming  from  the  channel)  is  shown  in  Figure  7.15,  but  the  computation  (and  the  code 
fragment  implementing  it)  applies  to  variable  nodes  of  arbitrary  degrees. 

Code  Fragment  7.5.1  Variable  Node  Update 

function  Lout  =  variable_update(Lchannel ,Lin  ) 

'/ocomputes  outgoing  messages  from  a  variable  node 
7„Lchannel  =  LLR  from  channel  for  that  variable 
'/oLin  =  vector  of  LLRs  coming  in  from  check  nodes 
7„Lout  =  vector  of  LLRs  going  out  to  check  nodes 
7oNote:  dimension  of  Lin  and  Lout  =  variable  node  degree 

7. 
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7oOutgoing  message  on  an  edge  =  sum  of  incoming  messages  on  all  other  edges 
7o (including  LLR  from  channel) 

7oEfficient  computation:  sum  over  all  edges  and  subtract  incoming  message  for  each  edge 
Lout  =  sum(Lin)  +  Lchannel  -  Lin;  7oVector  of  the  same  dimension  as  Lin 

Exercise:  A  variable  node  of  degree  3  has  channel  LLR  0.25,  and  incoming  LLR  messages  from 
check  nodes  —1.5,  0.5,  —2. 

(a)  If  yon  had  to  make  a  hard  decision  on  the  variable  based  on  this  information,  what  would  it 
be? 

(b)  What  are  the  outgoing  messages  back  to  the  check  nodes? 

Answers:  (a)  The  hard  decision  would  be  £  =  1  (6  =  —1).  (b)  The  outgoing  messages  to  the 
check  nodes  are  —1.25.  —  3.25,  —0.75. 


Incoming  messages  Outgoing  messages 

(computed  using  tanh  rule) 

Figure  7.16:  Check  node  update.  The  outgoing  messages  are  computed  using  the  tanh  rule: 
tanh(Mfc/2)  =  11*^^  tanh(nj/2). 


Message  generation  for  check  nodes,  depicted  in  Figure  7.16,  is  more  complicated.  Consider 
a  check  node  of  degree  three,  corresponding  to  the  parity  check  equation  Xi  ®  X2  ®  x^,  =  0. 
Suppose  that  the  incoming  messages  tells  us  that  the  LLRs  for  these  three  bits  are  Vi  =  Lj„(a;i), 
^2  =  Lin{x2),  and  =  Lin^x^).  Let  us  compute  the  outgoing  message  M3  =  Loutix^)  on  the  edge 
corresponding  to  variable  x^.  We  have 


Pout[x3  =  0]  =  Pin[xi  =  0,  0:2  =  0]  +  Pin[Xl  =  1,X2  =  1] 

=  Pin[Xl  =  0]Pin[x2  =  0]  +  Pin[Xl  =  l]Pin[x2  =  1] 


Plugging  in  from  (7.36),  we  obtain  that 

^Lout  Ca)  {Xi)+Lir,{x2)  ^ 

^Loutixd,)  -j-  ^  ^6^  in  Pi)  +  l)(e^™p2)  +  1) 

As  shown  in  Problem  7.18,  this  simplihes  to 

tanh  (M3/2)  =  tanh  (mi/2)  tanh  (^2/2) 

We  can  decompose  the  preceding  into  (intermediate)  hard  decisions  and  reliabilities  as  follows: 

sign(M3)  =  sign(Mi)sign(M2)  (7.43) 

log  I  tanh(M3/2)|  =  log  |  tanh(Mi/2)|  +  log  |  tanh(M2/2)|  (7.44) 

Figure  7.16  illustrates  the  update  for  a  check  node  of  degree  3.  However,  these  computations 
generalize  to  a  check  node  of  arbitrary  degree,  as  implemented  in  the  following  code  fragment. 


(7.41) 

(7.42) 
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Code  Fragment  7.5.2  Check  Node  Update 

function  Lout  =  check_update (  Lin  ) 

y„computes  messages  going  out  from  a  check  node 

7oLin  =  vector  of  messages  coming  in  from  variable  nodes 

y„Lout  =  vector  of  messages  going  out  to  variable  nodes 

y„convert  LLRs  to  reliabilities  and  signs 

reliabilities_in  =  log(abs (tanh(Lin/2) ) ) ; 

signs_in  =  sign(Lin); 

^compute  check  update 

reliabilities_out  =  sum(reliabilities_in)  -  reliabilities_in; 

sign_product  =  prod(signs_in) ; 

signs_out  =  sign_product . *signs_in; 

y„convert  reliabilities  and  signs  back  to  LLRs 

Lout  =  2*atanh(exp(reliabilities_out) ) . *signs_out ; 

Exercise:  A  check  node  of  degree  4  has  incoming  LLRs  —3.5,  2.2,  0.25, 1.3. 

(a)  Is  the  check  satished  by  the  incoming  messages?  That  is,  if  we  made  hard  decisions  based  on 
the  incoming  LLRs,  would  they  satisfy  the  parity  check  equation  corresponding  to  this  node? 

(b)  Use  code  fragment  7.5.2  to  determine  the  corresponding  outgoing  LLRs.  How  are  the  signs 
and  reliabilities  of  the  outgoing  LLRs  related  to  those  of  the  incoming  messages? 

Answers:  (a)  No.  (b)  The  outgoing  LLRs  are  0.1139,  —0.1340,  —0.9217,  —0.1880.  The  signs  are 
flipped,  and  the  larger  reliabilities  become  smaller,  while  the  smallest  reliability  increases.  Why 
does  this  make  sense? 

Once  we  have  dehned  the  computations  at  the  variable  and  check  nodes,  all  that  is  needed  to 
implement  belief  propagation  is  to  route  messages  according  to  the  edges  dehned  by  a  given 
parity  check  matrix  (of  course,  the  choice  of  code  and  parity  check  matrix  determines  whether 
iterative  decoding  is  effective).  At  any  stage  of  iterative  decoding,  we  can  make  hard  decisions 
at  a  variable  node  using  (7.37),  where  the  LLR  is  the  sum  of  all  incoming  LLRs,  including 
the  channel  LLR.  If  the  resulting  estimated  vector  x  satishes  Hx  =  0,  then  we  know  that  we 
have  obtained  a  valid  codeword  and  we  can  terminate  the  decoding.  Typically,  if  we  do  not 
obtain  a  valid  codeword  after  a  specihed  number  of  iterations,  then  we  declare  decoding  failure. 
We  implement  belief  propagation  based  iterative  decoding  in  Software  Lab  7.1;  while  we  use  our 
running  example  (5,  2)  code,  the  software  developed  in  this  lab  provides  a  generic  implementation 
of  belief  propagation  for  any  linear  block  code  once  the  parity  check  matrix  is  specihed. 


7.6  Concept  inventory  on  channel  coding 

This  section  provides  a  glimpse  of  channel  coding  concepts,  including  fundamental  performance 
limits  established  by  Shannon  theory  and  constructive  strategies  for  approaching  these  limits. 
Key  points  are  summarized  as  follows. 

•  The  need  for  non-trivial  channel  codes  is  motivated  by  examining  two  extremes  when  sending 
a  block  of  bits  over  a  binary  symmetric  channel:  uncoded  communication  (probability  of  packet 
error  tends  to  one  as  blocklength  increases)  and  repetition  coding  (the  code  rate  tends  to  zero 
as  blocklength  increases). 

•  Channel  coding  consists  of  introducing  structured  redundancy  in  the  transmitted  bits/symbols. 
While  there  are  many  possible  coded  modulation  strategies,  we  focus  on  BICM,  a  simple,  hexible, 
and  ehective  approach  cascading  a  binary  code  and  an  interleaver,  followed  by  mapping  of  bits 
to  modulated  symbols. 

Shannon  limits 

•  For  a  given  channel  (hxing  parameters  such  as  power  and  bandwidth).  Shannon  theory  tells  us 
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that  there  is  a  well-dehned  maximum  possible  rate  of  reliable  communication,  termed  the  channel 
capacity.  For  a  passband  bandlimited  AWGN  channel  with  bandwidth  W  (Hz),  the  capacity  is 
given  by  hF  log2  (1  +  F’s/A'o),  which  translates  to  the  following  fundamental  power-bandwidth 
tradeoff: 

E,/No  >2^-1,  E./No  > 

r 

where  Eg  is  the  energy  per  transmitted  symbol,  Ef,  is  the  energy  per  information  bit,  and  r  is  the 
spectral  efficiency  (the  information  bit  rate  normalized  by  the  bandwidth).  These  results  were 
derived  after  hrst  showing  that  the  capacity  of  a  discrete  time  real  AWGN  channel  is  given  by 
I  log2(l  +  S/N)  bits  per  channel  use. 

•  The  channel  capacity  for  a  BSG  with  crossover  probability  p  is  1  —  Hb{p)  =  1  +  plog2P  -|- 
(1  —  p)  log2(l  —  p)  bits  per  channel  use.  For  BICM,  such  a  channel  is  obtained,  for  example,  by 
making  hard  decisions  on  Gray  coded  constellations. 

•  Shannon  limits  can  be  used  for  guidelines  for  choosing  system  sizing:  for  example,  the  combi¬ 
nation  of  code  rate  and  constellation  size  that  is  appropriate  for  a  given  SNR. 

•  The  performance  of  a  given  coded  modulation  strategy  can  be  compared  to  fundamental  limits 
by  comparing  the  SNR  at  which  it  attains  a  certain  performance  (e.g.,  a  BER  of  10“^)  with  the 
minimum  SNR  required  for  reliable  communication  at  that  spectral  efficiency. 

Linear  codes 

•  Linear  codes  are  a  popular  and  well-understood  design  choice  in  modern  communication  sys¬ 
tems.  The  2^  codewords  in  an  (n,  k)  binary  linear  code  C  form  a  fc-dimensional  subspace  of  the 
space  of  n-dimensional  binary  vectors,  under  addition  and  multiplication  over  the  binary  held. 
The  dual  code  is  an  (n,  n  —  k)  linear  code  such  that  each  codeword  in  C  is  orthogonal  (under 
binary  inner  products)  to  each  codeword  in  C^. 

•  A  basis  for  an  (n,  k)  linear  code  C  can  be  used  to  form  a  generator  matrix  G.  A  /c-dimensional 
information  vector  u  can  be  encoded  into  an  n-dimensional  codeword  x  using  the  generator  ma¬ 
trix:  X  =  uG. 

•  A  basis  for  the  dual  code  C"*-  can  be  used  to  form  a  parity  check  matrix  H  satisfying  Hx^  =  0 
for  any  xeC. 

•  The  choices  for  G  and  H  are  not  unique,  since  the  choice  of  basis  for  a  linear  vector  space  is 
not  unique.  Furthermore,  we  may  add  redundant  rows  to  H  to  aid  in  decoding. 

•  The  number  of  errors  t  that  a  code  can  be  guaranteed  to  correct  satishes  2f  -|-  1  <  dmin,  where 
dmin  is  the  minimum  Hamming  distance  between  codewords.  For  a  linear  code,  dmin  equals  the 
minimum  weight  among  nonzero  codewords,  since  the  all-zero  vector  is  always  a  codeword,  and 
since  the  difference  between  codewords  is  a  codeword. 

•  The  translation  of  codewords  by  error  vectors  can  be  enumerated  in  a  standard  array,  whose 
rows  correspond  to  translations  of  the  entire  code,  termed  cosets,  by  a  given  error  pattern,  termed 
coset  leader.  A  more  compact  representation  lists  only  coset  leaders  and  syndromes,  obtained 
by  operating  the  parity  check  matrix  on  a  given  received  word.  These  can  be  used  to  carry  out 
a  look-up  based  implementation  of  bounded  distance  decoding. 

Tanner  graphs 

•  An  (n,  k)  linear  code  with  n  x  r  {r  >  n  —  k)  parity  check  matrix  H  can  be  represented  by  a 
Tanner  graph,  with  n  variable  nodes  on  one  side,  and  r  check  nodes  on  the  other  side,  with  an 
edge  between  the  jth  variable  and  ith  check  node  if  and  only  if  H(i,  j)  =  1. 

•  Message  passing  on  the  Tanner  graph  can  be  used  for  iterative  decoding,  which  scales  well 
to  very  large  code  block  lengths.  One  approach  is  to  employ  bit  flipping  algorithms  with  hard 
decision  inputs  and  binary  messages,  but  a  more  powerful  approach  is  to  use  soft  decisions  and 
belief  propagation. 

Soft  decisions  and  belief  propagation 

•  The  messages  passed  between  the  variable  and  check  nodes  are  the  bit  LLRs.  The  message 
going  out  on  an  edge  depends  on  the  messages  coming  in  on  all  the  other  edges. 
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•  Outgoing  messages  from  a  variable  node  are  generated  simply  by  summing  LLRs.  Outgoing 
messages  from  a  check  node  are  more  complicated,  but  can  be  viewed  as  a  product  of  signs,  and 
a  sum  of  reliabilities. 


7.7  Endnotes 

The  material  in  this  chapter  has  been  selected  to  make  two  points:  (a)  information  theory 
provides  fundamental  performance  benchmarks  that  can  be  used  to  guide  parameter  selection 
for  communication  links;  (b)  coding  theory  provides  constructive  strategies  for  approaching  these 
fundamental  benchmarks.  We  now  list  some  keywords  associated  with  topics  that  a  systematic 
exposition  of  information  and  coding  theory  might  cover,  and  then  provide  some  references  for 
further  study. 

Keywords:  A  systematic  study  of  information  theory,  and  its  application  to  derive  theorems  in 
source  and  channel  coding,  includes  concepts  such  as  entropy,  mutual  information,  divergence 
and  typicality.  A  systematic  study  of  the  structure  of  algebraic  codes,  such  as  BCH  and  RS  codes, 
is  required  to  understand  their  construction,  their  distance  properties  and  decoding  algorithms 
such  as  the  Berlekamp-Massey  algorithm.  A  study  of  convolutional  codes,  their  decoding  using 
the  Viterbi  algorithm,  and  their  performance  analysis,  is  another  important  component  of  a 
study  of  channel  coding.  Tight  integration  of  convolutional  codes  with  modulation  leads  to 
trellis  coded  modulation.  Suitably  interleaving  convolutional  codes  leads  to  turbo  codes,  which 
can  be  decoded  iteratively  using  the  forward-backward,  or  BCJR,  algorithm.  LDPC  codes,  which 
can  be  decoded  iteratively  by  message  passing  over  a  Tanner  graph  (as  described  here  and  in 
software  lab  7.1),  are  of  course  an  indispensable  component  in  modern  communication  design. 

Further  reading:  One  level  up  from  the  glimpse  provided  here  is  a  self-contained  introduction 
to  “just  enough”  information  theory  to  compute  performance  benchmarks  for  communication 
systems,  and  a  selection  of  constructive  coding  and  decoding  strategies  (including  convolutional, 
turbo,  and  LDPC  codes),  in  the  author’s  graduate  text  [7]  (Chapters  6  and  7).  The  textbook 
by  Cover  and  Thomas  [36]  is  highly  recommended  for  a  systematic  and  lucid  exposition  of 
information  theory.  Shannon’s  beautifully  written  work  [37]  establishing  the  held  is  also  highly 
recommended  as  a  source  of  inspiration.  The  textbook  by  McEliece  [38]  is  a  good  source  for  a  hrst 
exposure  to  information  theory  and  algebraic  coding.  A  detailed  treatment  of  algebraic  coding 
is  provided  by  the  textbook  by  Blahut  [39],  while  comprehensive  treatments  of  channel  coding, 
including  both  algebraic  and  turbo-like  codes,  are  provided  in  the  texts  by  Lin  and  Costello  [40] 
and  Moon  [41]. 


7.8  Problems 

Shannon  limits 

Problem  7.1  Consider  a  coded  modulation  strategy  pairing  a  rate  |  binary  code  with  QPSK. 
Assuming  that  this  scheme  performs  1.5  dB  away  from  the  Shannon  limit,  what  are  the  minimum 
values  of  Es/Nq  (dB)  and  E^/Nq  (dB)  required  for  the  scheme  to  work? 

Problem  7.2  At  BER  of  10“®,  how  far  away  are  the  following  uncoded  constellations  from 
the  corresponding  Shannon  limits:  QPSK,  8PSK,  16QAM,  64QAM.  Use  the  nearest  neighbors 
approximation  for  BER  of  Gray  coded  constellations  in  Section  6.4. 

Problem  7.3  Consider  Gray  coded  QPSK,  8PSK,  16QAM,  and  64QAM. 

(a)  Assuming  that  we  make  ML  hard  decisions,  use  the  nearest  neighbors  approximation  for 
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BER  of  Gray  coded  constellations  in  Section  6.4  to  plot  the  BER  as  a  function  of  Es/Nq  (dB) 
for  each  of  these  constellations. 

(b)  The  hard  decisions  induce  a  BSC  with  crossover  probability  given  by  the  BERs  computed  in 

(a).  Using  the  BSC  capacity  formula  (7.17),  plot  the  capacity  in  bits  per  symbol  as  a  function 
of  Eg /No  (dB)  for  each  constellation.  Also  plot  for  comparison  the  capacity  of  the  bandlimited 
AWGN  channel  given  by  ().  Comment  on  the  penalty  for  hard  decisions,  as  well  as  any  other 
trends  that  you  see. 


Problem  7.4  Consider  a  BICM  system  employing  a  rate  |  binary  code  with  Gray  coded  QPSK 
modulation. 

(a)  What  is  Es/Nq  in  terms  of  E^/Nol 

(b)  Based  on  the  AWGN  capacity  region  (7.11),  what  is  the  Shannon  limit  for  this  system  (i.e., 
the  minimum  required  Ef,/No  in  dB)? 

(c)  Now,  consider  the  suboptimal  strategy  of  making  hard  decisions,  thus  inducing  a  BSC.  What 
is  the  Shannon  limit  for  the  system?  What  is  the  degradation  in  dB  due  to  making  hard  decisions? 
Hint:  Hard  decisions  on  Gray  coded  QPSK  symbols  induce  a  BSC  with  crossover  probability 

p  =  Q  {^\j2Eg/N^ ,  whose  capacity  is  given  by  (7.17).  The  Shannon  limit  is  the  minimum  value 

of  Eb/No  for  the  capacity  to  be  larger  than  the  code  rate  being  used. 

Problem  7.5  A  rate  1/2  binary  code  is  employed  using  bit  interleaved  coded  modulation  with 
QPSK,  16QAM,  and  64QAM. 

(a)  What  are  the  bit  rates  attained  by  these  three  schemes  when  operating  over  a  passband 
channel  of  bandwidth  10  MHz  (ignore  excess  bandwidth). 

(b)  Assuming  that  each  coded  modulation  scheme  operates  2  dB  from  the  Shannon  limit,  what 
is  the  minimum  value  of  Eg/ No  (dB)  required  for  each  of  the  three  schemes  to  provide  reliable 
communication? 

(c)  Assuming  that  these  three  schemes  are  employed  in  an  adaptive  modulation  strategy  which 
adapts  the  data  rate  as  a  function  of  the  range.  Assuming  that  the  largest  attainable  range 
among  the  three  schemes  is  10  km.  Assuming  inverse  square  path  loss,  what  are  the  ranges 
corresponding  to  the  other  two  schemes. 

(d)  Now,  if  we  add  binary  codes  of  rate  |  and  |,  plot  the  attainable  bit  rate  versus  range  for 
an  adaptive  modulation  scheme  allowing  all  possible  pairings  of  code  rates  and  constellations. 
Assume  that  each  scheme  is  2  dB  away  from  the  corresponding  Shannon  limit. 

Problem  7.6  (a)  Apply  L’Hospital’s  rule  to  evaluate  the  limit  of  the  right-hand  side  of  (7.11)  as 
r  ^  0.  What  is  the  minimum  possible  Eb/No  in  dB  at  which  reliable  communication  is  possible 
over  the  AWGN  channel? 

(b)  Re-plot  the  region  for  reliable  communication  shown  in  Figure  7.6,  but  this  time  with  spectral 
efficiency  r  (bps/Hz)  versus  the  SNR  Eg/No  (dB).  Is  there  any  lower  limit  to  Eg/No  below  which 
reliable  communication  is  not  possible?  If  so,  what  is  it?  If  not,  why  not? 


Linear  codes  and  bounded  distance  decoding 

Problem  7.7  A  parity  check  matrix  for  the  (7,4)  Hamming  code  is  given  by 

/  1  0  0  1  0  1  1  \ 

H=  0  1  0  1  1  0  1  (7.45) 

\  0  0  1  0  1  1  1  / 

(a)  Find  a  generator  matrix  for  the  code. 

(b)  Find  the  minimum  distance  of  the  code.  How  many  errors  can  be  corrected  using  bounded 
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distance  decoding? 

Answer:  dmin  =  3,  hence  a  bounded  distance  decoder  can  correct  one  error. 

(c)  Write  down  the  standard  array.  Comment  on  how  any  structural  differences  you  see  between 
this  and  the  standard  array  for  the  (5,  2)  code  in  Table  7.1. 

Answer:  Unlike  in  Table  7.1,  no  binary  vectors  are  “left  over”  after  running  through  the  single 
error  patterns.  The  Hamming  code  is  a  perfect  code:  the  decoding  spheres  of  radius  one  cover 
the  entire  space  of  length-7  binary  vectors.  “Perfect”  in  this  case  just  refers  to  how  well  decoding 
spheres  can  be  packed  into  the  available  space;  it  definitely  does  not  mean  “good,”  since  the 
Hamming  code  is  a  weak  code. 

(d)  Write  down  the  mapping  between  coset  leaders  and  syndromes  for  the  given  parity  check 
matrix  (as  done  in  Table  7.2  for  the  (5,2)  code). 

Problem  7.8  Suppose  that  the  (7, 4)  Hamming  code  is  used  over  a  BSC  with  crossover  proba¬ 
bility  p  =  0.01.  Assuming  that  bounded  distance  decoding  with  decoding  radius  one  is  employed, 
hnd  the  probability  of  correct  decoding,  the  probability  of  decoding  failure,  and  the  probability 
of  undetected  error. 

Problem  7.9  Append  a  single  parity  check  to  the  (7, 4)  Hamming  code.  That  is,  given  a 
codeword  x  =  {xi,...,X7)  for  the  (7,4)  code,  define  a  new  codeword  z  =  (xi, ...,  X7,  xg)  by 
appending  a  parity  check  on  the  existing  code  bits: 


Xg  =  Xi  ©  X2  ©  ...  ©  X7 


This  new  code  is  called  an  extended  Hamming  code. 

(a)  What  are  n  and  k  for  the  new  code? 

(b)  What  is  the  minimum  distance  for  the  new  code? 

Problem  7.10  Hamming  codes  of  different  lengths  can  be  constructed  using  the  following  pre¬ 
scription:  the  parity  check  matrix  consists  of  all  nonzero  binary  vectors  of  length  m,  where  m  is 
a  positive  integer. 

(a)  What  is  the  value  of  m  for  the  (7, 4)  Hamming  code? 

(b)  For  arbitrary  m,  what  are  the  values  of  code  block  length  n  and  the  number  of  information 
bits  /c  as  a  function  of  m? 

Hint:  The  code  block  length  is  the  number  of  columns  in  the  parity  check  matrix.  The  dimension 
of  the  dual  code  is  the  rank  of  the  parity  check  matrix.  Remember  that  row  rank  equals  the 
column  rank.  Which  is  easier  to  find  in  this  case? 

Problem  7.11  BCH  codes  (named  after  their  discoverers,  Bose,  Ray-Chaudhury,  and  Hoc- 
quenghem)  are  a  popular  class  of  linear  codes  with  a  well-defined  algebraic  structure  and  well- 
understood  algorithms  for  bounded  distance  coding.  For  a  positive  integer  m,  we  can  construct 
a  binary  BCH  code  which  can  correct  at  least  t  errors  with  the  following  parameters: 

77,  =  2™  —  1,  k  >n  —  mt,  dmin  >  2t  +  1  (7.46) 

so  that  the  code  rate  i?  =  ^  >  1  —  2^^,  where  the  inequality  for  k  is  often  tight  for  small  values 
of  t.  For  example,  Hamming  codes  are  actually  (2™  —  1,  2”^  —  1  —  m)  BCH  codes  with  t  =  1. 
Remark:  The  price  of  increasing  the  block  length  of  a  BCH  code  is  decoding  complexity.  Algebraic 
decoding  of  a  code  of  length  n  =  2™  —  1  requires  operations  over  GF{2^). 

(a)  Consider  a  (1023,  923)  BCH  code.  Assuming  that  the  inequality  for  k  is  tight,  how  many 
errors  can  it  correct? 

Answer:  t  =  10. 

(b)  Assuming  that  the  inequality  for  k  in  (7.46)  is  tight,  what  is  the  rate  of  a  BCH  code  with 
n  =  511  and  t  =  10? 
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Problem  7.12  Consider  an  (n,  k)  linear  code  used  over  a  BSC  channel  with  crossover  probability 
p.  The  number  of  errors  among  n  code  bits  is  X  ~  Bin{n,p).  A  bounded  distance  decoder  of 
radius  t  is  used  to  decode  it  (assume  that  the  code  is  capable  of  correcting  at  least  t  errors).  The 
probability  of  incorrect  decoding  is  therefore  given  by 

F.  =  F|X  >  «]  =  ^  (  ”  )  /(l  -  p)”-*  (7,47) 

k=t+l  ^  ' 

The  computation  in  (7.47)  is  straightforward,  but  for  large  n,  numerical  problems  can  arise  when 

evaluating  the  terms  in  the  sum  directly,  because  ^  ^  ^  very  large  values,  and  p^  can 

take  very  small  values.  One  approach  to  alleviate  this  problem  is  to  compute  the  binomial  pmf 
recursively. 

(a)  Show  that 

P[X  =  k]=  ^  ^  V[X  =  fc-l]  ,  A:  =  l,...,n  (7.48) 

1  —  p  k 

(b)  Use  the  preceding,  together  with  the  initial  condition  P[X  =  0]  =  (1  —  p)”,  to  write  a  Matlab 
program  to  compute  P[X  >  t]. 

Problem  7.13  Use  the  standard  array  in  Table  7.1  for  an  exact  computation  of  the  probabilities 
of  decoding  failure  and  decoding  error  for  the  (5,  2)  code,  for  bounded  distance  decoding  with 
t  =  1  over  a  BSC  with  crossover  probability  p.  Plot  these  probabilities  as  a  function  of  p  on  a 
log-log  scale. 

Hint:  Assume  that  the  all- zero  codeword  is  sent.  Find  the  number  and  weight  of  error  patterns 
resulting  in  decoding  failure  and  decoding  error,  respectively 

Problem  7.14  For  the  binomial  tail  probability  (7.47)  associated  with  the  probability  of  incor¬ 
rect  decoding,  we  are  often  interested  in  large  n  and  relatively  small  t;  for  example,  consider  the 
(1023,923)  BCH  code  in  Problem  7.11,  for  which  t  =  10.  While  recursive  computations  as  in 
Problem  7.12  are  relatively  numerically  stable,  we  are  often  interested  in  quick  approximations 
that  do  not  require  the  evaluation  of  a  large  summation  with  (n  —  t)  terms.  In  this  problem,  we 
discuss  some  simple  approximations. 

(a)  We  are  interested  in  designing  systems  to  obtain  small  values  of  P^,  hopefully  significantly 
smaller  than  the  input  BER  p.  Argue  that  p  >  ^  is  an  uninteresting  regime  from  this  point  of 
view.  What  is  the  uninteresting  regime  for  the  (1023,  923)  BCH  code? 

(b)  For  p  <C  -,  argue  that  the  sum  in  (7.47)  is  well  approximated  by  its  first  term. 

(c)  Since  X  is  a  sum  of  n  i.i.d.  Bernoulli  random  variables,  show  that  the  CLT  can  be  used  to 
approximate  its  distribution  by  a  Gaussian:  X  ~  X(np,  np(l  —  p)). 

(d)  For  the  (1023,  923)  BCH  code,  compute  a  numerical  estimate  of  the  probability  of  incorrect 
decoding  for  f  =  10  and  p  =  10“^  in  three  different  ways:  (i)  direct  computation,  (ii)  estimation 
by  the  hrst  term  as  in  (b),  (iii)  estimation  using  the  Gaussian  approximation. 

(e)  Repeat  (d)  for  p  =  10“^. 

(f)  Comment  on  the  match  (or  otherwise)  between  the  three  estimates  in  (d)  and  (e).  What 
happens  with  smaller  p? 

Problem  7.15  Here  are  the  (n,  fc,  t)  parameters  for  some  other  binary  BCH  codes  for  which  the 
computations  of  Problem  7.12  can  be  repeated:  (1023,  863, 16),  (511, 421, 10),  (255,  215,  5). 

Problem  7.16  Reed-Solomon  (RS)  codes  are  a  widely  used  class  of  codes  on  non-binary  alpha¬ 
bets.  While  we  do  not  discuss  the  algebraic  structure  of  any  of  the  codes  we  have  mentioned,  we 
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state  in  passing  that  RS  codes  can  be  viewed  as  a  special  class  of  BCH  codes.  The  symbols  in 
an  RS  code  come  from  GF{2^)  (a  finite  field  with  2™  elements,  where  m  is  a  positive  integer), 
hence  each  symbol  can  be  represented  by  m  bits.  The  code  block  length  eqnals  n  =  2™  —  1.  The 
minimum  distance  is  given  by 

dmin  =  n  -  k  +  1  (7.49) 

This  is  actually  the  best  possible  minimum  distance  attainable  for  an  (n,  k)  code.  It  is  possible 
to  extend  the  RS  code  by  one  symbol  to  obtain  n  =  2™,  and  to  shorten  the  code  to  obtain 
n  <  2"*  —  1,  all  the  while  maintaining  the  minimum  distance  relationship  (7.49).  Bounded 
distance  decoding  can  be  used  to  correct  up  to  J  =  or  dmin  —  I  =  n  —  k  erasures, 

or  any  pattern  of  t  errors  and  e  erasures  satisfying  2f  +  e  +  1  <  dmin  =  n  —  k  +  1.  One  drawback 
of  RS  codes:  it  is  not  possible  to  obtain  code  block  lengths  larger  than  2”*,  the  alphabet  size. 

(a)  What  is  the  maximum  number  of  symbol  errors  that  a  (255,  235)  RS  code  can  correct?  How 
many  bits  does  each  symbol  represent?  In  the  worst  case,  how  many  bits  can  the  code  correct? 
How  about  in  the  best  case? 

(b)  The  (255,  235)  RS  code  is  used  as  an  outer  code  in  a  system  in  which  the  inner  code  produces 
a  BER  of  10“^.  What  is  the  symbol  error  probability,  assuming  that  the  bit  errors  are  i.i.d.? 
Assuming  bounded  distance  decoding  up  to  the  maximum  possible  number  of  correctable  errors, 
find  the  probability  of  incorrect  decoding. 

Note:  The  symbol  error  probability  p  =  1  —  (1  —  p?,)™,  where  pb  is  the  BER  and  m  the  number 
of  bits  per  symbol. 

(c)  What  is  the  BER  that  the  inner  code  must  produce  in  order  for  the  (255,  235)  RS  code  to 
attain  a  decoding  failure  probability  of  less  than  10“^^? 

(d)  If  the  BER  of  the  inner  code  is  fixed  at  10“^  and  the  block  length  and  alphabet  size  of  the 
RS  code  are  as  in  (b)-(c),  what  is  the  value  of  k  for  which  the  decoding  failure  probability  is  less 
than  10“^^? 

Remark:  While  we  consider  random  bit  errors  in  this  problem,  inner  decoders  may  often  output 
a  burst  of  errors,  and  this  is  where  outer  RS  codes  become  truly  valuable.  For  example,  a  burst  of 
errors  spanning  30  bits  corresponds  to  at  most  5  symbol  errors  in  an  RS  code  with  8-bit  symbols. 
On  the  other  hand,  correcting  up  to  30  errors  using,  say,  a  binary  BCH  code  would  cost  a  lot  in 
terms  of  redundancy. 


LLR  computations 


Problem  7.17  Consider  a  BPSK  system  with  a  typical  received  sample  given  by 

Y  =  A{-IY  +  N  (7.50) 

where  A  >  0  is  the  amplitude,  x  G  {0, 1}  is  the  transmitted  bit,  and  N  ~  iV(0,  a^)  is  the  noise. 
Let  TTo  =  P[a:  =  0]  denote  the  prior  probability  that  x  =  0. 

(a)  Show  that  the  LLR 


L{x)  =  log 


P[x 

P[x 


M 


log 


7^0P(y|0) 

(1  -  7ro)p(2/|l) 


Conclude  that 

L(x)  L c,jiannel{pd)  T  Lpriori^x') 

where  Lchannei{x)  =  log  and  Lpriorix)  =  log  (b)  Write  down  the  conditional  densities 
p(p|a:  =  0)  and  p(p|x  =  1). 

(c)  Show  that  the  channel  LLR  Lchannei{x)  is  given  by 


L  channelip^  lo§ 


p(y|0) 

p{y\^) 
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(d)  Specify  (in  terms  of  the  parameters  A  and  a)  the  conditional  distribution  of  L channel-,  condi¬ 
tioned  on  X  =  0  and  x  =  1. 

(e)  Suppose  that  the  preceding  is  used  to  model  either  the  I  sample  or  Q  sample  of  a  Gray  coded 
QPSK  system.  Express  Es/Nq  for  the  system  in  terms  of  A  and  a. 


Answer: 


No 


(f)  Suppose  that  we  use  BICM  using  a  binary  code  of  rate  Rcode  prior  to  QPSK  modulation. 
Express  Es/Nq  for  the  QPSK  symbols  in  terms  of  E^/Nq. 


Answer:  ^  =  2Rcode§/- 


(e)  For  Ei)/Nq  of  3  dB  and  a  rate  2/3  binary  code,  what  is  the  value  of  A  if  the  noise  variance 
per  dimension  is  scaled  to  =  1. 

(g)  For  the  system  parameters  in  (f),  specify  numerical  values  for  the  parameters  governing  the 
conditional  distributions  of  the  LLR  found  in  (c). 

(h)  For  the  system  parameters  in  (f),  specify  the  probability  of  bit  error  for  hard  decisions  based 
on  Y. 


Problem  7.18  In  this  problem,  we  derive  the  tanh  rule  (7.42)  for  the  check  update,  hopefully 
in  a  ways  that  provides  some  insight  into  where  the  tanh  comes  from. 

(a)  For  any  bit  x  with  LLR  L,  we  have  observed  that  P[x  =  0]  =  Now,  show  that 

S  =  P[x  ^  ^  ^  ^  tanh(L/2)  (7.51) 

Thus,  the  tanh  provides  a  measure  of  how  much  the  distribution  of  x  deviates  from  an  equiprob- 
able  distribution. 

Now,  suppose  that  X3  =  Xi  ©X2,  where  Xi  and  X2  are  modeled  as  independent  for  the  purpose  of 
belief  propagation.  Let  R  denote  the  LLR  for  Xj,  and  set  P[xi  =  0]  —  ^  =  6i,  i  =  1,2,  3.  (Note 
that  P[xi  =  1]  =  \  —  5i.)  Under  our  model, 

F[x3  =  0]  =  P[xi  =  0]P[X2  =  0]  P[xi  =  1]P[X2  =  1] 

(b)  Plug  in  expressions  for  these  probabilities  in  terms  of  the  5*  and  simplify  to  show  that 

^3  =  26182 

(c)  Use  the  result  in  (a)  to  infer  the  tanh  rule 

tanh(L3/2)  =  tanh(Li/2)  tanh(L2/2) 


-3A  -A  -hA  -h3A 

- • - • - • - • - 

00  10  11  01 

Figure  7.17:  Gray  coded  4PAM  constellation. 


Problem  7.19  Gonsider  the  Gray  coded  4PAM  constellation  depicted  in  Figure  7.17.  Denote 
the  label  for  each  constellation  point  by  X1X2,  where  xi,  X2  G  {0, 1}.  The  received  sample  is  given 
by 


Y  =  s  +  N 
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where  s  G  {— SA,  —A,  A,  SA}  is  the  transmitted  symbol,  and  N  ~  iV(0,  is  noise, 

(a)  Find  expressions  for  the  channel  LLRs  for  the  two  bits: 


L/ channel  {,^1^  lo§ 


p{y\xi  =  0) 

p{y\xi  =  1) 


LchanneliX‘2^  lo§ 


p{y\x2  =  0) 
p{y\x2  =  1) 


Hint:  Note  that 

p{y\xi  =  0)  =  p{y\xiX2  =  00)  +  p{y\xiX2  =  01) 

(b)  Simulate  the  system  for  A  =  2,  normalizing  a  =  1,  and  choosing  the  bits  xi  and  X2  inde¬ 
pendently  and  with  equal  probability  from  {0, 1}.  Plot  the  histogram  for  LLRi  conditioned  on 
Xi  =  0  and  conditioned  on  xi  =  1  on  the  same  plot.  Plot  the  histogram  for  LLR2  conditioned 
on  0:2  =  0  and  conditioned  on  0:2  =  1  on  the  same  plot.  Are  the  conditional  distributions  in  each 
case  well  separated? 

(c)  You  wish  to  design  a  BICM  system  with  a  binary  code  of  rate  Rcode  to  be  used  with  4PAM 
modulation  with  A  and  a  as  in  (b).  Using  the  formula  (7.7)  for  the  discrete  time  AWGN  channel 
to  estimate  the  code  rate  to  be  used,  assuming  that  you  can  operate  3  dB  from  the  Shannon 

limit. 

Hint:  Compute  the  SNR  in  terms  of  A  and  a,  but  then  reduce  by  3  dB  before  plugging  into  (7.7) 
to  find  the  bits  per  channel  use. 

(d)  Repeat  (b)  and  (c)  for  A  =  1,  a  =  1. 


Software  Lab  7.1:  Belief  propagation 

The  purpose  of  this  lab  is  to  provide  hands-on  experience  with  belief  propagation  (BP)  for 
decoding.  As  a  warm-exercise,  we  hrst  apply  BP  to  our  running  example  (5,  2)  code,  for  which 
we  can  compare  its  performance  against  bounded  distance  decoding. 

We  then  introduce  array  codes,  a  class  of  LDPC  codes  with  a  simple  deterministic  construction 
which  can  provide  excellent  performance;  while  the  performance  of  the  array  codes  we  consider 
here  is  inferior  to  that  of  the  best  available  LDPC  codes,  the  gap  can  be  narrowed  considerably 
by  tweaking  them  (discussion  of  such  modihcations  is  beyond  our  scope). 

1)  Write  a  function  implementing  belief  propagation.  The  inputs  are  the  parity  check  matrix,  the 
channel  LLRs,  and  the  maximum  number  of  iterations.  The  outputs  are  a  binary  vector  which 
is  an  estimate  of  the  transmitted  codeword,  a  bit  indicating  of  whether  this  binary  vector  is  a 
valid  codeword,  and  the  number  of  iterations  actually  taken.  To  be  concrete,  we  start  dehning 
the  function  below. 

function  [xhat , valid_codeword, iter]  =  belief _propagation(H,Lchannel ,max_iter) 
y.'/o  INPUTS 

7„H  =  parity  check  matrix 

y„L channel  =  LLRs  obtained  from  channel 

ymax.iter  =  maximum  allowed  number  of  iterations 

y.yoOUTPUTs 

yoXhat=binary  vector  (estimate  of  transmitted  codeword) 

’/ovalid.codeword  =  1  if  x  is  a  codeword 
'/oiter  =  number  of  iterations  taken  to  decode 

'/.yNEED  TO  FILL  IN  THE  FUNCTION  NOW 

One  possible  approach  to  hlling  in  the  function  is  to  take  the  following  steps: 

(a)  Build  the  Tanner  graph:  Civen  the  parity  check  matrix  H,  find  and  store  the  neighbors  for 
each  variable  node  and  each  check  node.  This  can  be  done  using  a  cell  array,  as  follows. 
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7odetermine  number  of  nodes  on  each  side  of  the  Tanner  graph 
[number_check_nodes ,n]  =  size(H); 

7oStore  indices  of  edges  from  variable  to  check  nodes 
variables_edges_index  =  cell(n,l); 
for  j=l:n 

variables_edges_index{j}  =  find(H( : , j)==l) ; 

end 

7oStore  indices  of  edges  from  check  to  variable  nodes 
check_node_edges_index  =  cell(number_check_nodes, 1) ; 
for  i=l :number_check_nodes 

check_node_edges_index{i}-  =  f  ind(H(i ,  :  )==1) ' ; 

end 


(b)  Build  the  message  data  structure:  We  can  maintain  messages  (LLRs)  in  a  matrix  of  the 
same  dimension  as  H,  with  nonzero  entries  only  where  H  is  nonzero.  The  jth  variable  node 
will  read/write  its  messages  from/to  the  jth  column,  while  the  ith  check  node  will  read/write 
its  messages  from/to  the  ith  row.  Initialize  messages  from  variable  nodes  to  the  channel  LLRs, 
and  from  check  nodes  to  zeros.  We  maintain  two  matrices,  one  corresponding  to  messages  from 
variable  nodes,  and  one  corresponding  to  messages  from  check  nodes. 


y„messages  from  variable  nodes 

Lout_variables  =  H . *repmat (Lchannel ' ,number_check_nodes , 1) ; 
y„messages  from  check  nodes 
Lout_check_nodes  =  zeros (size (H) ) ; 

(c)  Implement  message  passing:  We  can  now  use  the  variable  update  and  check  update  func¬ 
tions  (code  fragments  7.5.1  and  7.5.2  respectively),  along  with  the  preceding  data  structure,  to 
implement  message  passing. 


"/oinitialize  message  passing 

valid_codeword  =  0;  ^indicates  valid  codeword  found 
iter  =  0; 

while (iter<max_iter  &&  ~valid_codeword) 

"/oloop  over  check  nodes  to  generate  messages 
for  i=l :number_check_nodes 

Lout_check_nodes (i , check_node_edges_index{i})  =  check_update (Lout_variables (i , check_node 

end 

yioop  over  variable  nodes 
for  j=l:n 

Lout_variables(variables_edges_index-[j}, j)  =  variable_update(Lchannel(j) ,Lout_check_node 

end 

y„check  for  valid  codeword 

bhat  =  sign(sum(Lout_check_nodes) '  +  Lchannel) ;  "/ohard  decisions  +1,-1 
X  =  (l-bhat)/2;  y„convert  hard  decisions  from  {+1,-1}  to  {0,1} 
if (mod(H*x,2)==zeros(number_check_nodes , 1) ) 
valid_codeword  =  1; 

end 

iter  =  iter  +1; 
end 


Putting  (a)-(c)  together  gives  the  desired  function. 
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2)  Write  a  program  to  check  that  the  preceding  belief  propagation  function  works  for  our  example 
(5,  2)  code,  using  the  parity  check  matrix  corresponding  to  the  Tanner  graph  in  Figure  7.13,  and 
generating  the  channel  LLRs  as  described  in  Problem  7.17.  Specifically,  consider  Gray  coded 
QPSK  modulation,  where  the  1  and  Q  components  follow  the  BPSK  model  in  Problem  7.17. 
Note  that  A  and  a  in  Problem  7.17  must  be  chosen  appropriately  (fix  one,  say  a  =  1,  and  scale 
the  other)  based  on  the  spectral  efficiency  r  (r  =  4/5  for  QPSK  with  the  (5,  2)  code)  and  E^/Nq. 
Assume,  without  loss  of  generality,  that  the  all- zero  codeword  is  sent.  Decoding  error  therefore 
occurs  when  the  belief  propagation  function  returns  a  nonzero  codeword,  or  reports  that  a  valid 
codeword  was  not  found  after  the  maximum  allowed  number  of  iterations. 

3)  Use  simulations  to  estimate  and  plot  the  probability  of  decoding  error  (log  scale)  with  BP 
as  a  function  of  Eb/No  (dB).  On  the  same  graph,  also  plot  the  probability  of  decoding  error  for 
bounded  distance  decoding  with  hard  decisions  (this  can  be  computed  analytically,  as  described 
in  Problem  7.12),  and  the  probability  of  error  for  uncoded  QPSK.  Comment  on  the  results. 
Does  BP  with  soft  decisions  provide  an  improvement  over  bounded  distance  decoding?  Is  the 
performance  better  than  that  of  uncoded  QPSK?  For  your  reference,  an  example  unlabeled  plot 
is  provided  in  Figure  7.18.  Guess  the  labels  for  the  three  plots  before  verifying  them  using  your 
own  computations  and  simulations. 


Figure  7.18:  Performance  of  the  (5,  2)  code  with  QPSK  modulation,  comparing  belief  propagation 
with  soft  decisions  against  bounded  distance  decoding  with  hard  decisions.  Also  plotted  for 
comparison  is  the  performance  of  uncoded  QPSK.  Which  curve  is  which? 


Array  codes:  We  now  introduce  the  class  of  array  codes,  whose  parity  check  matrix  is  charac¬ 
terized  by  three  positive  integers  (p,  J,  L),  and  is  of  the  following  form: 


H 


I 

P 

p2 


I 

p2 

p4 


I  \ 

pL— 1 
p2(L-l) 


y  I  p2(J-l) 


p(j-i)(L-i)  y 


(7.52) 


where  I  denotes  a  p  x  p  identity  matrix,  and  p  is  a  prime  number.  The  matrix  P  is  obtained  by 
cyclically  shifting  the  rows  of  I  by  one.  Thus,  for  p  =  3,  we  have 


I 


1  0  0  \ 

0  10,  P 
0  0  1/ 


0  1  0  \ 

0  0  1  j  ,  for  p  =  3 
10  0/ 
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The  matrix  P  is  a  permutation  matrix,  in  the  sense  that  for  any  p  x  1  vector  u  =  [ui, 
the  vector  Pu  is  a  permutation  of  u.  For  this  choice  of  P,  the  vector  Pu  =  {up,ui,  ...,Up-iY 
is  a  cyclic  shift  of  u  by  one.  Raising  P  to  an  integer  power  k  simply  corresponds  to  applying  k 
successive  cyclic  shifts,  so  that  P^  is  a  cyclic  shift  of  the  rows  of  I  by  k,  and  P^u  is  a  cyclic  shift 
of  u  by  k. 

The  parity  check  matrix  H  in  (7.52)  consists  of  JL  p  x  p  blocks,  with  the  (j,  /)th  block  being 
l<j<J,l<l<L.  The  code  length  n  =  pL,  the  number  of  columns  (or  variable 
nodes).  The  column  weight  equals  J  for  each  column  (make  sure  you  check  this);  that  is,  each 
variable  node  has  degree  J.  The  number  of  rows  (or  check  nodes)  equals  pJ,  but  some  of  these 
rows  may  be  redundant,  so  the  dimension  of  the  dual  code  n  —  k  <  pj.  In  fact,  it  can  be  shown 
that  exactly  J  —  1  rows  are  redundant.  (To  see  why  this  might  be  true,  add  the  hrst  p  rows,  and 
then  the  next  p  rows.  What  answers  do  you  get?  What  does  this  tell  you  about  the  number  of 
linearly  independent  rows?)  Thus,  the  rank  of  H  equals  n  —  k  =  pJ  —  {J  —  1),  so  that  the  code 
dimension  k  =  p{L  —  J)  +  J  —  1.  We  therefore  summarize  as  follows: 

n  =  pL  ,  k  =  p{L  —  J)  +  J  —  1  for  a  (p,  J,  L)  array  code  (7.53) 

Popular  choices  of  the  variable  node  degree  J  =  3,4.  Analysis  of  code  properties  show  that  we 
should  restrict  L  <  p.  We  can,  for  example,  use  a  large  prime  p  and  moderate  sized  L,  or  set 
L  =  p  for  a  relatively  small  value  of  p. 

4)  Write  a  function  to  generate  the  parity  check  matrix  of  an  array  code,  whose  inputs  are  p,  J,  L 
and  outputs  are  ii,n,k. 


function  [H,n,k]  =  array_code(p, J,L) 

7oGenerates  the  parity  check  matrix  for  an  array  code 

7.7„  INPUTS 

7op  is  a  prime 

7oJ=  check  node  degree  (column  weight)  ,  usually  set  to  3  or  4 

7oL  =  parameter  <=  p  that  determines  code  length 

7.7oOUTPUTS 

7oH  =  parity  check  matrix 
7o7on  =  pL  (code  length) 

7o7ok  =  p(L-J)+J-l  (number  of  info  bits) 

7oP  times  p  identity  matrix 
Iblock  =  eye(p); 

7o  can  use  Matlab’s  circshift  operation  on  Iblock  to  generate  P  and  its  powers 
7ofor  example,  circshift  (Iblock,  [0  (j-l)*(l-l)] )  generates  the  (j,l)th  block  of  H 
7.7oNOW  FILL  IN  THE  FUNCTION !  7.7o7. 

5)  Consider  an  array  code  with  p  =  11,  with  L  =  p  and  J  =  4,  used  as  before  with  Gray  coded 
QPSK  and  BICM.  As  before,  use  simulations  to  estimate  and  plot  the  probability  of  decoding 
error  (log  scale)  with  BP  as  a  function  of  E^/Nq  (dB)  for  a  BICM  system  employing  QPSK. 
Compare  the  performance  {Ei,/Nq  for  decoding  error  probability  of  10“^)  with  the  Shannon 
limit  for  that  spectral  efficiency.  To  limit  the  simulation  cost,  you  may  wish  to  use  a  relatively 
small  number  of  simulation  runs  to  generate  your  plots,  and  to  estimate  the  value  of  Eb/No  the 
probability  of  decoding  error  starts  falling  below,  say,  10“^,  and  then  use  a  larger  number  of  runs 
for  a  few  carefully  chosen  values  of  E^/Nq  to  see  when  the  decoding  error  probability  hits  10“^. 
How  does  this  E^/Nq  compare  with  that  required  for  10“^  BER  with  uncoded  QPSK? 

6)  Repeat  5)  for  larger  values  of  the  prime  number  p  (still  keeping  L  =  p  and  J  =  4),  within  the 
limits  of  your  computational  infrastructure.  For  example,  try  p  =  47. 
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7)  Repeat  5)  with  large  p  and  relatively  small  L;  for  example,  p  =  911  and  L  =  8,  still  keeping 
J  =  4.  How  does  the  code  rate  and  spectral  efficiency  (with  QPSK)  compare  with  5)  and  6)? 

Lab  Report:  Your  lab  report  should  answer  the  preceding  questions  in  order,  and  should  document 
the  reasoning  you  used  and  the  difficulties  you  encountered.  Comment  on  the  decoding  error 
probability  trends  as  you  vary  the  code  parameters. 
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Chapter  8 

Dispersive  Channels  and  MIMO 


From  the  material  in  Chapters  4-6,  we  now  have  an  understanding  of  commonly  used  modulation 
formats,  noise  models,  and  optimum  demodulation  for  the  AWGN  channel  model.  Chapter  7 
discusses  channel  coding  strategies  for  these  idealized  models.  In  this  hnal  chapter,  we  discuss 
more  sophisticated  channel  models,  and  the  corresponding  signal  processing  schemes  required  at 
the  demodulator. 

We  hrst  consider  the  following  basic  model  for  a  dispersive  channel;  the  transmitted  signal  passes 
through  a  linear  time-invariant  system,  and  is  then  corrupted  by  white  Gaussian  noise.  The  LTI 
model  is  broadly  applicable  to  wireline  channels,  including  copper  wires,  cable  and  hber  optic 
communication  (at  least  over  shorter  distances,  over  which  hber  nonlinearities  can  be  neglected), 
as  well  as  to  wireless  channels  with  quasi-stationary  transmitters  and  receivers.  For  wireless 
mobile  channels,  the  LTI  model  is  a  good  approximation  over  durations  that  are  small  compared 
to  the  time  constants  of  mobility,  but  still  fairly  long  on  an  electronic  timescale  (e.g.,  of  the  order 
of  milliseconds).  Methods  for  compensating  for  the  effects  of  a  dispersive  channel  are  generically 
termed  equalization.  We  introduce  two  common  design  approaches  for  this  purpose. 

The  hrst  approach  is  singlecarrier  modulation,  which  refers  to  the  linear  modulation  schemes 
discussed  in  Chapter  4,  where  the  symbol  sequence  modulates  a  transmit  pulse  occupying  the 
entire  available  bandwidth.  We  discuss  linear  zero  forcing  (ZF)  and  Minimum  Mean  Squared 
Error  (MMSE)  equalization  techniques,  which  are  suboptimal  from  the  point  of  view  of  mini¬ 
mizing  error  probability,  but  are  intuitively  appealing  and  less  computationally  complex  than 
optimum  equalization.  (We  refer  the  reader  to  more  advanced  texts  for  discussion  of  optimum 
equalization  and  its  performance  analysis.)  We  discuss  adaptive  implementation  and  geometric 
interpretation  for  linear  equalizers. 

The  second  approach  to  channel  dispersion  is  Orthogonal  Frequency  Division  Multiplexing  (OFDM), 
where  linear  modulation  is  applied  in  parallel  to  a  number  of  subcarriers,  each  of  which  occupies 
a  bandwidth  which  is  small  compared  to  the  overall  bandwidth.  OFDM  may  be  viewed  as  a 
mechanism  for  ISI  avoidance.  It  is  based  on  the  observation  that  any  complex  exponential 
passes  through  any  LTI  system  with  transfer  function  H{f)  unchanged  except  for  multiplication 
by  i7(/o).  Thus,  we  can  send  a  number  of  complex  exponentials  termed  subcarriers, 

in  parallel  through  the  channel,  each  multiplied  by  an  information-bearing  symbol,  such  that 
interference  across  subcarriers  is  avoided.  The  task  of  channel  equalization  therefore  reduces  to 
compensating  separately  for  the  channel  gains  H{fi)  for  each  such  subcarrier.  Parallelizing  the 
problem  of  equalization  in  this  manner  is  particularly  attractive  when  the  underlying  time  domain 
impulse  response  h{t)  is  complicated  (e.g.,  an  indoor  wireless  channel  where  there  are  a  large 
number  of  paths  with  multiple  bounces  off  walls  and  ceilings  between  transmitter  and  receiver). 
We  discuss  how  this  intuition  is  translated  into  practice  using  transceiver  implementations  using 
digital  signal  processing  (DSP). 

Finally,  we  discuss  multiple  antenna  communication,  also  popularly  known  as  Multiple  Input 
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Multiple  Output  (MIMO),  or  space-time,  communication.  There  is  a  great  deal  of  commonality 
between  signal  processing  for  dispersive  channels  and  for  MIMO,  which  is  why  we  treat  these 
topics  within  the  same  chapter.  Furthermore,  the  combination  of  OFDM  with  MIMO  allows 
parallelization  of  transceiver  signal  processing  for  complicated  channels,  and  has  become  the 
architecture  of  choice  for  both  WiFi  (the  IEEE  802. lln  standard)  and  for  fourth  generation 
cellular  systems  (LTE,  or  long  term  evolution).  Three  key  concepts  for  MIMO  are  covered: 
beamforming  (directing  energy  towards  a  desired  communication  partner),  diversity  (combating 
fading  by  using  multiple  paths  from  transmitter  to  receiver),  and  spatial  multiplexing  (using 
multiple  antennas  to  support  parallel  data  streams). 

Chapter  Plan:  Compared  to  the  earlier  chapters,  this  chapter  has  a  somewhat  unusual  orga¬ 
nization.  For  dispersive  channels,  a  key  goal  is  to  provide  hands-on  exposure  via  software  labs. 
A  model  for  singlecarrier  linear  modulation  over  a  dispersive  channel,  including  code  fragments 
for  modeling  the  transmitter  and  the  channel,  is  presented  in  Section  8.1.  Linear  equalization 
is  discussed  in  Section  8.2.  Sections  8.1  and  8.2.1  provide  just  enough  background,  including 
code  fragments,  for  Software  Lab  8.1  on  adaptive  implementation  of  linear  equalization.  Section 
8.2.2  provides  geometric  insight  into  why  the  implementation  in  Software  Lab  8.1  works,  and 
provides  a  framework  for  analytical  computations  related  to  MMSE  equalization  and  the  closely 
related  notion  of  zero- forcing  (ZF)  equalization.  It  is  not  required  for  actually  doing  Software 
Lab  8.1.  The  key  concepts  behind  OFDM  and  its  DSP-centric  implementation  are  discussed  in 
Section  8.3,  whose  entire  focus  is  to  provide  background  for  developing  a  simplihed  simulation 
model  for  an  OFDM  link  in  Software  Lab  8.2.  Finally,  MIMO  is  discussed  in  Section  8.4,  with 
the  signal  processing  concepts  for  MIMO  communication  reinforced  by  Software  Lab  8.3.  The 
problems  at  the  end  of  this  chapter  focus  on  linear  equalization  concepts  discussed  in  Section 
8.2.2,  and  on  performance  evaluation  of  core  MIMO  techniques  (beamsteering,  diversity  and 
spatial  multiplexing)  discussed  in  Section  8.4. 


8.1  Singlecarrier  System  Model 

We  first  provide  a  system-level  overview  of  singlecarrier  linear  modulation  over  a  dispersive 
channel.  Figure  8.1  shows  block  diagrams  corresponding  to  a  typical  DSP-centric  realization  of 
the  transceiver.  The  DSP  operations  are  performed  on  digital  streams  at  an  integer  multiple 
of  the  symbol  rate,  denoted  by  m/T.  For  example,  we  might  choose  m  =  4  for  implementing 
the  transmit  and  receive  filters,  but  we  might  subsample  the  output  of  the  receive  filter  down 
to  m  =  2  before  implementing  an  equalizer.  We  model  the  core  components  of  such  a  system 
using  the  complex  baseband  representation,  as  shown  in  Figure  8.2.  Given  the  equivalence  of 
passband  and  complex  baseband,  we  are  only  skipping  modeling  of  hnite  precision  effects  due 
to  digital-to-analog  conversion  (DAC)  and  analog-to-digital  conversion  (ADC).  These  effects  can 
easily  be  incorporated  into  models  such  as  those  we  develop,  but  are  beyond  our  current  scope. 

We  focus  on  a  hands-on  development  of  the  key  ideas  using  discrete  time  simulation  models, 
illustrated  by  code  fragments. 


8.1.1  Signal  Model 

We  begin  with  an  example  of  linear  modulation,  to  see  how  ISI  arises  and  can  be  modeled. 
Consider  linear  modulation  using  BPSK  with  a  sine  pulse,  which  leads  to  a  transmitted  baseband 
waveform  shown  in  Figure  8.3(a).  The  Matlab  code  used  for  generating  this  plot  is  given  below. 
We  have  sampled  much  faster  than  the  symbol  rate  (at  32/T)  in  order  to  obtain  a  smooth  plot. 
In  practice,  we  would  typically  sample  at  a  smaller  multiple  of  the  symbol  rate  (e.g.  at  4/T)  to 
generate  the  input  to  the  DAC  in  Figure  8.1. 
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Figure  8.1;  Typical  DSP-centric  transceiver  realization.  Our  model  does  not  include  the  blocks 
shown  in  dashed  lines.  Finite  precision  effects  due  to  digital  to  analog  conversion  (DAC)  and 
analog  to  digital  conversion  (ADC)  are  not  considered.  The  upconversion  and  downconversion 
operations  are  not  modeled.  The  passband  channel  is  modeled  as  an  LTI  system  in  complex 
baseband. 
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Figure  8.2:  Block  diagram  of  a  linearly  modulated  system,  modeled  in  complex  baseband. 
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Figure  8.3:  The  outputs  of  the  transmit  and  receive  hlter  without  channel  dispersion.  The 
symbols  can  be  read  off  from  sampling  each  waveform  at  the  times  indicated  by  the  stem  plot. 

We  provide  Matlab  code  fragments  that  convey  the  concepts  underlying  discrete-time  modeling 
and  implementation.  The  code  fragments  also  show  how  some  of  the  plots  here  are  generated, 
with  cosmetic  touches  omitted. 

The  following  code  fragment  shows  how  to  work  with  discrete  time  samples  using  oversampling 
at  rate  m/T,  including  how  to  generate  the  plot  of  the  transmitted  waveform  in  Figure  8.3(a). 


Code  Fragment  8.1.1  (Transmitted  waveform) 

7„choose  large  oversampling  factor  for  smooth  plots 
oversampling_f actor  =  32; 
m  =  oversampling_f actor ;  Zfor  brevity 
y„generate  sine  pulse 

time_over_symbol  =  cumsum(ones (m, 1) )-l ; 
transmit.! liter  =  sin(time_over_symbol*pi/m) ; 

'/onumber  of  symbols 
nsymbols  =  10; 

7oBPSK  symbol  generation 

symbols  =  sign(rand(nsymbols , 1)  -.5); 

7oexpress  symbol  sequence  at  oversampled  rate  using  zeropadding, 

7o (starts  and  ends  with  nonzero  symbols) 

Lpadded  =  m*  (nsymbols  -1)+1;  7o7olength  of  zeropadded  sequence 
symbolspadded  =  zeros  (Lpadded,  1) ;  7o7oinitialize 

symbolspadded  (1  :m:Lpadded)  =  symbols;  7o7ofill  in  bit  values  every  m  entries 
7o7onow  all  convolutions  can  be  performed  in  oversampled  domain 
transmit.output  =  conv (symbolspadded, transmit.! liter) ; 

7oplot  transmitted  waveform  and  sampling  times 
tl  =  (cumsum(ones(length(transmit.output)))-l)/m; 
figure ; 

plot (tl , transmit. output , 'b’) ; 
xlabeK  ^t/TC  ; 
hold  on; 

7„choose  sampling  times  in  accordance  with  peak  of  transmit  filter  response 
[maxval  maxloc]  =  max(transmit. filter)  ;  7ofind  peak  location 
sampling.! imes  =  maxloc :m: (nsymbols-1) *m+maxloc ; 
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sampled.outputs  =  transmit_output (sampling_times) ; 
stemC (sampling_times-l) /m , sampled_outputs , ^  r ' ) 
hold  off ; 

If  this  waveform  now  goes  through  an  ideal  channel,  and  we  use  a  receive  filter  with  impulse 
response  matched  to  the  transmitted  pulse,  then  the  waveform  we  obtain  is  shown  in  Figure 
8.3(b).  The  transmit  filter  impulse  response  is  time  limited  to  length  T  and  hence  square  root 
Nyquist  (see  Chapter  4),  hence  the  net  response  to  a  single  symbol,  which  is  a  cascade  of  the 
transmit  filter  with  its  matched  filter,  is  Nyquist.  It  follows  that,  by  sampling  at  the  right 
moments  (as  marked  on  the  plot),  we  can  recover  the  symbols  exactly. 

We  now  provide  a  code  fragment  to  model  the  channel  and  receive  filter;  it  can  be  employed  for 
modeling  both  ideal  and  dispersive  channels.  Appending  it  to  code  fragment  8.1.1  generates  and 
plots  the  noiseless  received  waveform. 

Code  Fragment  8.1.2  (Modeling  the  channel  and  receive  filter) 

dispersive  =  0;  7oSet  this  to  0  for  ideal  channel,  and  to  1  for  dispersive  channel 
if  dispersive  ==  0, 
channel  =  1 ; 
else 

channel  =  [0 . 8; zeros (m/2 , 1) ; -0 . 7 ; zerosCm, 1) ; -0 . 6] ; 

y„(or  substitute  your  favorite  choice  of  dispersive  channel) 

end 

y„noiseless  receiver  input 

receive_input  =  conv(transmit_output , channel) ; 
t2  =  (cumsum(ones(length(receive_input)))-l)/m; 
figure ; 

plot(t2,receive_input) ; 
xlabeK  't/T’ )  ; 

"/receive  filter  matched  to  transmit  filter 
"/(would  also  need  to  conjugate  if  complex-valued) 
receive_f liter  =  flipud(transmit_f liter) ; 

"/receive  filter  output  (normalized  to  account  for  oversampling) 
receive_output  =  (1/m) *conv(receive_input ,receive_f liter) ; 
t3  =  (cumsum(ones(length(receive_output) , l))-l)/m; 

"/plot  receive  filter  output  together  with  sample  locations  chosen  based  on  peak  of  net  respo 
figure ; 

plot (t3 , receive_output , ’ b  O ; 
xlabeK  't/T’ )  ; 
hold  on; 

"/effective  pulse  at  channel  output 
pulse  =  conv(transmit_f liter, channel) ; 

"/effective  pulse  at  receive  filter  output  (normalized  to  account  for  oversampling) 
rx_pulse  =  conv(pulse ,receive_f liter) /m; 

[maxval  maxloc]  =  max(rx_pulse) ; 

rx_sampling_times  =  maxloc :m: (nsymbols-l)*m+maxloc; 
rx_sampled_outputs  =  receive_output (rx_sampling_times) ; 
stem( (rx_sampling_times-l) /m,rx_sampled_outputs , ’r ’ ) ; 
hold  off; 

Figure  8.4  shows  a  dispersive  channel  and  the  corresponding  noiseless  receive  filter  output.  The 
effective  pulse  given  by  the  cascade  of  the  transmit,  channel  and  receive  filters  is  no  longer 
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(a)  Dispersive  channel.  (b)  Receive  filter  output. 

Figure  8.4:  When  the  transmitted  waveform  passes  through  the  dispersive  channel  shown,  we 
can  no  longer  read  off  the  symbols  reliably  by  sampling  the  output  of  the  receive  hlter.  For  this 
particular  set  of  symbols,  one  of  the  symbols  is  estimated  incorrectly,  even  though  there  is  no 
noise. 


Nyquist,  hence  we  do  not  expect  a  symbol  decision  based  on  a  single  sample  to  be  reliable. 
Figure  8.4(b)  shows  the  severe  distortion  due  to  ISI  with  a  “best  effort”  choice  of  sampling  times 
(chosen  based  on  the  peak  of  the  effective  pulse).  In  particular,  for  the  specihc  symbol  sequence 
shown,  one  (out  of  ten)  of  the  symbol  estimates  obtained  by  taking  the  signs  of  these  samples  is 
incorrect. 
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(b)  Dispersive  channel. 


Figure  8.5:  Eye  diagrams  with  and  without  channel  dispersion.  The  eye  is  closed  for  the  channel 
considered,  which  means  that  reliable  symbol  decisions  are  not  possible  without  equalization. 


Eye  diagrams:  A  classical  technique  for  visualizing  the  effect  of  ISI  is  the  eye  diagram.  It 
is  constructed  by  overlapping  multiple  segments  of  the  received  waveform  over  a  hxed  window, 
which  tells  us  how  different  combinations  of  symbols  could  potentially  create  ISI.  For  an  ideal 
channel  and  square  root  Nyquist  pulses  at  either  end,  the  eye  is  open,  as  shown  in  Figure  8.5(a). 
However,  for  the  dispersive  channel  in  Figure  8.4(b),  we  see  from  Figure  8.5(b)  is  closed.  An  open 
eye  implies  that,  by  an  appropriate  choice  of  sampling  times,  we  can  make  reliable  single-sample 
symbol  decisions,  while  a  closed  eye  means  that  more  sophisticated  equalization  techniques  are 
needed  for  symbol  recovery. 

Physically,  an  eye  diagram  can  be  generated  using  an  oscilloscope  with  the  baseband  modulated 
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signal  as  the  vertical  input,  with  horizontal  sweep  triggered  at  the  symbol  rate.  A  code  fragment 
for  generating  the  eye  pattern  from  discrete  time  samples  at  rate  m/T  is  given  below.  (While 
Matlab  has  its  own  eye  diagram  routine,  this  code  fragment  is  provided  in  order  to  clearly  convey 
the  concept.)  The  output  of  the  receive  hlter  generated  in  code  fragment  8.1.2  is  the  input  to 
this  fragment,  but  in  general,  we  could  plot  an  eye  diagram  based  on  the  baseband  waveform  at 
any  stage  in  the  system.  For  complex  baseband  signals,  we  would  plot  the  eye  diagrams  for  the 
I  and  Q  components  separately. 

Code  Fragment  8.1.3  (Eye  diagram) 

y„remove  edge  effects  before  doing  eye  diagram 
rl  =  receive_out (m/2 :m/2+nsymbols*m-l) ; 

/ohorizontal  display  length  in  number  of  symbol  intervals 
K=2; 

y„break  into  non-overlapping  traces 
Rl=reshape(rl,K*m,length(rl)/ (K*m)) ; 
y„now  enforce  continuity  across  traces 

"/(append  to  each  trace  the  first  element  of  the  next  trace) 
rowl  =  Rl (1 , : ) ; 

L=length(rowl) ; 
row_pruned  =  rowl(2:L); 

R_pruned  =  R1(:,1:L-1); 

R2  =  [R_pruned; row_pruned] ; 

time  =  (0:K*m)/m;  "/time  as  a  multiple  of  symbol  interval 
plot (time ,R2) ; 
xlabel('t/TC  ; 


8.1.2  Noise  Model  and  SNR 

In  continuous  time,  our  model  for  the  noisy  input  to  the  receive  hlter  is 

yit)  =  Y.  h[n\p{t  —  nT)  +  n{t)  (8.1) 

n 

where  p{t)  =  {qtx  *  Qc)  (t)  is  the  “effective  pulse”  given  by  the  cascade  of  the  transmit  pulse 
and  the  channel  hlter,  {^[n.]}  is  the  symbol  sequence,  which  is  in  general  complex- valued,  and 
n{t)  is  complex  WGN  with  PSD  =  ^.  We  translate  this  model  directly  into  discrete  time 
by  constraining  t  =  kT/m  +  r,  where  m/T  is  the  sampling  rate  (m  a  positive  integer)  and  r 
equals  the  sampling  ohset.  The  noise  at  the  input  to  the  receive  hlter  is  now  modeled  as  discrete 
time  white  Gaussian  noise  (WGN)  with  variance  ^  per  dimension.  As  we  well  know  from 
Ghapter  6,  the  absolute  value  of  the  noise  variance  is  meaningless  unless  we  also  specify  the  signal 
scaling,  hence  we  hx  either  the  signal  or  noise  strength,  and  set  the  other  based  on  SNR  measures 
such  as  Ei,/Nq  or  Es/Nq.  Here  Eg  =  E[|6[n]p]||p|p  for  the  model  (8.1),  and  Ef,  =  Es/\og2M 
as  usual,  where  M  is  the  constellation  size.  Inner  products  and  norms  are  computed  in  discrete 
time. 

Note  that,  with  the  preceding  convention,  the  noise  energy  in  a  hxed  time  interval  scales  up  with 
the  sampling  rate,  and  so  does  the  signal  energy  (since  we  have  more  samples  whose  energies 
we  are  adding  up),  with  the  SNR  converging  to  the  continuous-time  SNR  as  the  sampling  rate 
gets  large.  However,  for  a  sampling  rate  that  is  a  small  multiple  of  the  symbol  rate,  the  SNR  for 
the  discrete  time  system  can,  in  general,  be  different  from  that  in  the  original  continuous  time 
system.  We  do  not  worry  about  this  distinction  here.  The  following  code  fragment  illustrates 
adding  noise  to  our  simulation  model. 
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We  now  provide  a  code  fragment  which  adds  discrete  time  WGN  to  the  receive  filter  input, 
resulting  in  colored  noise  at  the  output.  We  add  this  to  the  signal  component  already  computed 
in  code  fragment  8.1.2. 

Code  Fragment  8.1.4  (Noise  modeling) 

bn_energy  =  1;%  for  BPSK  with  current  normalization 

Es  =  bn_energy* (pulse ' *pulse)  ;  "/opulse  is  cascade  of  transmit  and  channel  filters 
constellation_size=2;  %for  BPSK 
Eb  =  Es/log2(constellation_size) ; 

7„specify  Eb/NO  in  dB 
ebnodb=5 ; 

ebnoraw  =  10~  (ebnodb/10) ;  °/oraw  Eb/NO 
N0=Eb/ ebnoraw ; 

7onoise  standard  deviation  per  dimension 
sigma  =  sqrt(N0/2); 

7onoise  at  input  to  receive  filter 

7o (would  also  need  to  add  an  imaginary  component  for  complex-valued  signals) 
noise_receive_input  =  sigma*randn(size(receive_input) ) ; 

7o (would  also  need  to  add  an  imaginary  component  for  complex-valued  signals) 
7onoise_receive_input  =  noise_receive_input  +  li*sigma*randn(size(receive_input) ) ; 
7onoise  at  output  of  receive  filter 

noise_receive_output  =  (l/m)*conv(noise_receive_input,receive_f liter) ; 

7onoisy  receive  filter  output 

receive_output_noisy  =  receive_output  +  noise_receive_output ; 


8.2  Linear  equalization 

We  have  seen  that  single-sample  symbol  decisions  are  unreliable  when  the  eye  is  closed.  However, 
what  if  we  are  willing  to  use  multiple  samples  for  each  symbol  decision?  Typically,  the  transmitter 
and  receiver  may  implement  hxed  filters  in  DSP  at  a  faster  sampling  rate  than  the  sampling  rate 
used  eventually  for  equalization.  Thus,  suppose  that  we  have  samples  at  rate  m/T  from  the 
output  of  the  receive  filter,  but  we  now  wish  to  use  rate  q/T  samples  for  equalization,  where 
q  divides  m.  For  example,  we  may  have  m  =  4  and  q  =  2.  We  subsample  the  output  of  the 
receive  hlter,  taking  one  out  of  every  m/q  samples,  and  then  use  L  consecutive  samples,  collected 
into  a  vector  r[?7,],  to  make  a  decision  on  symbol  h[ri\.  We  would  want  to  choose  these  samples 
so  that  the  bulk  of  the  response  due  to  h[n\  falls  within  the  observation  interval  over  which 
we  collect  these  samples.  When  we  want  to  make  a  decision  on  the  next  symbol  h[n  -|-  1],  we 
must  slide  this  observation  interval  over  by  T  in  order  to  obtain  the  received  vector  r[?7,  -|-  1]. 
Since  our  sampling  rate  is  now  g/T,  this  corresponds  to  an  offset  of  q  samples  between  successive 
observation  intervals.  Note  that  an  observation  interval  typically  spans  multiple  symbol  intervals, 
so  that  successive  observation  intervals  overlap  Figures  8.6  and  8.7  illustrate  this  concept  for  a 
channel  of  length  L  =  4  obtained  by  sampling  at  rate  2/T,  so  that  q  =  2.  The  overlap  between 
successive  observation  intervals  equals  L  —  q  =  2  samples. 

The  transmit,  channel  and  receive  hlters  are  LTI  systems  and  the  noise  is  stationary,  and  succes¬ 
sive  symbols  are  input  to  the  system  spaced  by  time  T.  Since  the  discrete  time  symbol  sequence 
is  stationary  as  well,  the  statistics  of  the  signal  at  any  stage  of  the  system  are  invariant  to  shifts 
by  integer  multiples  of  T.  Such  periodicity  in  the  statistics  is  termed  cyclostationarity.  This 
implies  that  the  statistics  of  the  noise  and  ISI  seen  in  different  observation  intervals  are  identical: 
the  only  change  is  in  which  symbol  plays  the  role  of  desired  symbol.  In  particular,  comparing 
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Figure  8.6:  The  observation  interval  used  to  make  a  decision  on  6[0]  sees  contributions  from  the 
desired  symbol  6[0]  and  interfering  symbols  6[— 1]  and  6[1]. 
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Figure  8.7:  The  observation  interval  used  to  make  a  decision  on  6[1]  sees  contributions  from  the 
desired  symbol  6[1]  and  interfering  symbols  6[0]  and  b[2].  Comparing  with  Figure  8.6,  the  roles 
of  the  symbols  has  shifted  by  one. 
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Figures  8.6  and  8.7,  we  see  that  the  roles  of  desired  and  interfering  symbols  shifts  by  one  as  we  go 
from  the  observation  interval  for  6[0]  to  that  for  6[1].  Thus,  an  appropriately  designed  strategy 
for  handling  ISI  over  a  given  observation  interval  should  work  for  other  observation  intervals  as 
well.  This  opens  up  the  possibility  of  realizing  adaptive  equalizers  which  can  learn  enough  about 
the  statistics  of  the  ISI  and  noise  to  compensate  for  them. 

We  focus  here  on  /mear  equalization,  which  corresponds  to  using  the  decision  statistic  c^r[n]  to 
estimate  6[n],  where  c  is  an  appropriately  chosen  correlator.  The  choice  of  c  can  be  independent 
of  n,  by  virtue  of  cyclostationarity.  For  BPSK  signaling,  for  example,  this  leads  to  a  decision 
rule 

h[n]  =  sign  (c^r[n])  (8.2) 


8.2.1  Adaptive  MMSE  Equalization 

While  constraining  ourselves  to  linear  equalization  is  suboptimal  (discussion  of  optimal  equaliza¬ 
tion  is  beyond  our  present  scope),  we  can  try  to  optimize  c  to  combat  ISI  and  noise.  In  particular, 
the  linear  MMSE  criterion  corresponds  to  choosing  c  so  as  to  minimize  the  mean  squared  error 
(MSE)  between  the  decision  statistic  and  the  desired  symbol,  dehned  as 

MSE  =  J(c)  =  E  [(c^r[n]  -  h[n]f]  (8.3) 

Minimizing  the  MSE  in  this  fashion  leads  to  minimizing  the  contribution  due  to  ISI  and  noise 
at  the  correlator  output,  which  is  clearly  a  desirable  outcome. 

The  MSE  is  a  quadratic  function  of  c,  and  can  therefore  be  minimized  by  setting  its  gradient 
with  respect  to  c  to  zero.  Due  to  linearity,  the  gradient  can  be  taken  inside  the  expectation,  and 
we  obtain 

VcJ(c)  =  2E  [r[n](c^r[n]  —  6[n])]  =  2E  [r[?7,](r^[n]c  —  6[n])] 

Dehning 

R  =  E  [r[n]r^[n]]  ,  p  =  E  [6[n]r[n]]  (8.4) 

we  can  rewrite  the  gradient  of  the  MSE  as 

VcJ(c)  =  2  (Rc  —  p)  (8.5) 

Setting  the  gradient  to  zero  yields  the  following  expression  for  the  MMSE  correlator: 

cmmse  =  R^^p  (8.6) 

In  order  to  compnte  this,  we  must  know,  or  be  able  to  estimate,  the  expectations  in  (8.4).  If  we 
know  the  transmit  hlter,  the  channel  hlter,  the  receive  hlter,  the  sampling  times,  and  the  noise 
PSD,  we  can  compute  these  expectations  using  a  model  such  as  (8.13).  However,  we  often  do 
not  have  explicit  knowledge  of  one  or  more  of  these  quantities.  Thus,  an  attractive  approach  in 
practice  is  to  exploit  the  stationarity  of  the  model  as  we  vary  n  to  estimate  expectations  using 
their  empirical  averages.  These  expectations  involve  the  received  vectors  r[?7,],  which  we  of  course 
have  access  to,  and  the  symbols  6[n],  which  we  assume  we  have  access  to  over  a  training  period  in 
which  a  known  sequence  of  symbols  is  transmitted.  This  approach  leads  to  adaptive  equalization 
techniques  that  do  not  require  explicit  knowledge  or  estimates  of  the  model  parameters. 

Least  Squares  Adaptation:  Assuming  that  the  hrst  ntraining  symbols  are  known,  least 
squares  adaptation  corresponds  to  replacing  the  expectations  in  (8.4)  by  their  empirical  averages 
as  follows: 


R 


ntraining 

ntraining 


P 


1 

ntraining 


ntraining 

6[n]r[n] 

ri=l 


(8.7) 
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where  the  normalization  by  -r — —  is  not  needed,  bnt  is  put  in  to  make  the  averaging  interpre- 
tation  transparent.  The  MMSE  correlator  is  now  approximated  by  the  least  squares  solution: 

Cis  =  (R)"^P  (8.8) 

This  correlator  can  now  be  used  to  make  decisions  on  the  unknown  symbols  following  the  training 
period.  It  can  be  checked  that  the  preceding  solution  minimizes  the  empirical  MSE  over  the 
training  period: 

ntraining 

MSE  =  ^  (c^r[n]  —  b[n\Y 

n=l 

Filter  implementation  of  linear  equalization:  For  conceptual  clarity,  we  have  introduced 
linear  equalization  as  a  correlator  operating  on  the  received  vectors  {r[n]}  obtained  by  windowing 
the  samples  at  the  output  of  the  receive  filter.  However,  an  efficient  technique  for  generating 
the  decision  statistics  c^r[n]  is  by  passing  the  received  samples  through  a  discrete  time  filter 
matched  to  c,  and  then  subsampling  the  output  at  the  symbol  rate  with  an  appropriate  delay. 

The  following  code  fragment  implements  and  tests  least  squares  adaptation,  comparing  it  with 
unequalized  estimates  obtained  by  sampling  at  the  peaks  of  the  net  response  to  a  symbol. 

Code  Fragment  8.2.1  (Least  squares  adaptive  equalization) 

y„Use  code  fragments  8.1.1,  8.1.2  with  a  large  value  of  nsymbols 
7„ (first  ntraining  symbols  assumed  to  be  known) 

7„Insert  noise  using  code  fragment  8.1.4 
7odownsample  to  q/T  to  get  input  to  equalizer 
q=2; 

r  =  receive_output_noisy(l :m/q: length(receive_output_noisy) ) ; 

7ofigure  out  net  response  to  a  single  symbol  at  receive  filter  output 
rx_pulse  =  (1/m) *conv(pulse ,receive_f liter) ; 

7oeffective  response  after  downsampling 
h=  rx_pulse(l :m/q: length (rx_pulse) ) ; 

7„set  equalizer  length 
L=6; 

7oChoose  how  to  align  correlator  (e.g.,  to  maximize  desired  vector  energy) 
desired_energy  =  conv(h. ~2, ones(L, 1) ) ; 

[max_energy  loc_max_energy]  =  max(desired_energy) ; 

7„choose  offset  to  align  correlator  with  desired  vector  to  maximize  energy 
offset  =  max(loc_max_energy-L,0) 

7„another  option:  set  equalizer  length  equal  to  effective  response 
7oL=length(h) ; 

7oOffset  =  0; 

7oinitialize  for  least  squares  adaptation 
phat  =  zeros (L,l); 

Rhat  =  zeros (L,L); 
for  n  =  l:ntraining, 

rn=r(l+q*(n-l)+off  set  :L+q*(n-l)+off  set) ;  7oCurrent  received  vector  r[n] 
phat  =  phat  +  symbols (n)*rn; 

Rhat  =  Rhat  +  rn*rn' ; 

end 

7oleast  squares  estimate  of  MMSE  correlator 

cLS  =  RhatXphat;  7oOften  more  stable  computation  than  inv(Rhat)*phat 
7oimplement  equalizer  as  filter 
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h_equalizer  =  flipud(cLS) ;  Zwould  also  need  conjugation  for  complex  signals 
equal izer_output  =  conv(r ,h_equalizer) ; 

7oSample  filter  output  at  symbol  rate  after  appropriate  delay 
delay  =  length (h_equalizer)+off set ; 

7oSymbol  decision  statistics 

decision_stats  =  equalizer_output(delay:q:delay+(nsymbols-l)*q) ; 

7opayload  =  non-training  symbols 
payload  =  symbols (ntraining+1 :nsymbols) ; 

7oestimate  of  payload  (for  BPSK) 

payload_estimate  =  sign (decision_stats (ntraining+1 insymbols)) ; 

7onumber  of  errors 

nerrors  =  sum(ne (payload, payload_estimate) ) 

7.C0MPARE  WITH  UNEQUALIZED  ESTIMATES 

7oUnequalized  estimates  obtained  by  sampling  at  peaks  of  effective  response 
[maxval  maxloc]  =  max(h) ; 

sampling_times  =  maxloc : q: (nsymbols-1) *q+maxloc ; 
unequalized_decision_stats  =  r (sampling_times) ; 
sampled_outputs  =  transmit_output (sampling_times) ; 

7oestimate  of  payload  (for  BPSK) 

payload_estimate_unequalized  =  sign(unequalized_decision_stats(ntraining+l :nsymbols)) ; 
7onumber  of  errors 

nerrors_unequalized  =  sum (ne (payload, payload_estimate_unequalized) ) 


Putting  code  fragments  8.1.1,  8.1.2,  8.1.4  and  8.2.1  together,  we  obtain  a  simulation  model  for 
adaptive  linear  equalization  over  a  dispersive  channel.  As  a  quick  example,  for  the  dispersive 
channel  considered,  at  Eh/NO  of  7  dB,  we  estimate  (using  nsymbols  =  10000,  ntraining  =  100) 
the  error  probability  after  equalization  at  rate  2/T  [q  =  2)  to  be  about  3.5  x  10“^  and  the 
unequalized  error  probability  to  be  about  0.16.  Linear  equalization  is  quite  effective  in  this  case, 
although  it  exhibits  some  degradation  relative  to  the  ideal  BPSK  error  probability  of  7.7  x  10“'^. 
We  can  now  build  on  this  code  base  to  run  a  variety  of  experiments,  as  suggested  in  Software 
Lab  8.1:  for  example,  probability  of  error  as  a  function  of  Eh/N^  for  different  equalizer  lengths, 
for  different  channel  models,  and  for  different  choices  of  the  transmit  and  receive  hlters.  Our 
model  extends  easily  to  complex-valued  constellations,  as  discussed  below. 

Extension  to  complex-valued  signals:  All  of  the  preceding  development  goes  through  for 
complex-valued  constellations  and  signals,  except  that  vector  transposes  x^  are  replaced  by 
conjugate  transposes  x^.  Indeed,  the  Matlab  code  fragments  we  provide  here  already  include 
this  level  of  generality,  since  we  use  the  conjugate  transpose  operation  x'  when  computing  the 
transpose  for  real-valued  x.  All  that  is  needed  to  employ  these  code  fragments  is  to  make  the 
symbols  complex- valued,  and  to  add  an  imaginary  component  to  the  noise  model  in  code  fragment 
8.1.4.  We  skip  derivations,  and  state  that  the  decision  statistics  are  given  by  c^r[n],  the  MSE 
expression  is 

MSE  =  J(c)  =  E  [|c^r[n]  -  b[n]\^] 
and  the  MMSE  solution  is  given  by  (8.6)  as  before,  with 

R  =  E  [r[?7,]r‘^[n]]  ,  p  =  E  [6*[n]r[?7,]]  (8.9) 

As  before,  these  statistical  expectations  can  be  replaced  by  empirical  averages  for  a  least  squares 
implement  at  ion . 

We  now  have  the  background  required  for  a  hands-on  exposure  to  equalization  through  Software 
Lab  8.1. 
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8.2.2  Geometric  Interpretation  and  Analytical  Computations 

Computer  simulations  using  the  code  fragments  in  Sections  8.1  and  8.2  show  that  adaptive 
MMSE  equalization  works  well,  at  least  in  the  specihc  examples  considered  in  these  sections, 
and  in  Software  Lab  8.1.  We  now  develop  geometric  insight  into  why  linear  equalization  works 
well  when  it  does,  and  when  it  might  run  into  trouble.  We  stick  with  real- valued  signals,  but  the 
results  extend  easily  to  complex-valued  signals,  as  noted  in  the  appropriate  places.  This  section 
is  not  required  for  doing  Software  Lab  8.1. 

Consider  the  example  depicted  in  Figures  8.6  and  8.7,  where  the  overall  sampled  response  (at 
rate  2/T)  to  a  single  symbol  is  assumed  to  be 


h  =  (...,0, -0.5, 1,0.5, -0.25,0,...) 

Consider  an  observation  interval  (i.e.,  equalizer  length)  of  length  L  =  4,  aligned  with  the  response 
to  the  desired  symbol  as  depicted  in  Figures  8.6  and  8.7.  As  shown  in  code  fragment  8.2.1,  we 
can  also  choose  smaller  or  larger  observation  intervals,  and  optimize  their  alignment  using  some 
criterion  (in  the  code  fragment,  the  criterion  is  maximizing  the  energy  of  the  desired  response 
falling  into  the  observation  interval).  In  addition  to  the  contribution  to  r[n]  due  to  5[n],  we  also 
have  contributions  from  other  symbols  before  and  after  it  in  the  sequence,  corresponding  to  parts 
of  appropriately  shifted  versions  of  the  response  h.  For  example,  the  response  to  h[n  -|-  1]  falling 
in  the  nth  observation  interval  is  obtained  by  shifting  h  by  g  =  2  and  then  windowing.  The 
received  vector  r[n]  can  therefore  be  written  as  follows. 

Model  for  L  =  4:  Two  interfering  symbols  fall  into  the  observation  interval.  The  observation 
interval  is  large  enough  to  accommodate  the  entire  response  due  to  the  desired  symbol. 


r[n]  =  h[n] 


-0.5  \ 
1 

0.5 

-0.25  / 


h[n  -I-  1] 


/  0  \ 

0 

-0.5 
1 


h[n  —  1] 


0.5  \ 

-0.25 
0 

0  / 


w[n] 


(8.10) 


where  our  convention  is  that  time  progresses  downward,  and  where  w[n]  denotes  noise.  The 
vector  multiplying  h[n]  is  the  desired  vector,  while  the  others  are  interference  vectors.  Figure  8.6 
corresponds  to  u  =  0,  while  Figure  8.7  corresponds  to  u  =  1. 

In  order  to  obtain  the  preceding  model,  the  vector  corresponding  to  a  given  symbol  is  obtained 
by  appropriately  shifting  h,  and  then  windowing  to  the  observation  interval.  In  order  to  ensure 
that  the  modeling  approach  is  clear,  we  also  provide  the  model  for  L  =  3,  where  the  observation 
interval  is  lined  up  with  the  hrst  three  elements  of  the  response  to  the  desired  symbol,  and  L  =  6, 
where  the  observation  interval  contains  two  additional  samples  on  either  side  of  the  response  to 
the  desired  symbol. 

Model  for  L  =  3:  The  observation  interval  is  smaller  than  the  desired  symbol  response.  Two 
interfering  symbols  fall  in  the  interval. 


/-0.5\  /  0  \ 

r[n]  =  b[n]  (  1  M-  5[n  -|-  1]  (  0  M-  6[n  —  1] 

V  0.5  ;  V  -0-5  / 


0.5  \ 

—0.25  j  -|-  w[n] 

0  J 


(8.11) 


Model  for  L  =  Q:  The  observation  interval  is  larger  than  the  desired  symbol  response.  Four 
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interfering  symbols  fall  in  the  interval. 


[n]  =  h[n] 


(  0  \ 

-0.5 
1 

0.5 

-0.25 

V  0  ) 


+  h[n  +  1] 


+  h[n  —  1] 


/  1  \ 

0.5 

-0.25 
0 
0 

V  0  ) 


(  0  \ 
0 
0 

-0.5 
1 

V  0-5  ) 


+  h[n  +  2] 


/  0  \ 
0 
0 
0 

V  -0-5  ) 


b[n  —  2] 


/  -0.25  \ 
0 
0 
0 
0 

V  0  / 


+  w  n 


(8.12) 


Vector  model  for  ISI:  In  general,  we  can  write  the  received  vector  over  observation  interval  n 
as  follows: 

r[n]  =  b[n]uQ  +  ^  b[n  +  k]uk  +  w[?7,]  (8.13) 

fc/O 

where  b[n\,  uq  are  the  desired  symbol  and  vector,  respectively;  b[n  +  k],  for  /c  7^  0  are 
interference  symbols  and  vectors,  respectively;  and  w[?7,]  ~  N{0,  C^)  denotes  the  vector  of  noise 
samples  at  the  output  of  the  receive  hlter,  windowed  to  the  current  observation  interval.  For 
an  equalizer  working  with  rate  q/T  samples,  we  have  already  noted  that  successive  observation 
intervals  are  offset  by  q  samples.  Clearly,  the  structure  of  the  ISI  remains  the  same  as  we  go  from 
observation  interval  n  to  n  +  1,  but  the  roles  of  the  symbols  are  shifted  by  one:  for  the  n  +  1st 
observation  interval,  b[n  +  1]  is  the  desired  symbol  multiplying  uq,  while  b[n  +  1  +  /c]  for  /c  7^  0 
is  the  interfering  symbol  multiplying  u^. 

Modeling  the  output  of  a  linear  correlator:  A  linear  correlator  c  operating  on  the  received 
vector  produces  the  following  output: 

c^r[n]  =  6[?7,]c^uo  +  ^  b[n  +  /cjc^u^  +  c^w[n]  (8-14) 

fc/O 


where  the  hrst  term  is  the  desired  term,  the  second  term  is  the  ISI  at  the  correlator  output, 
and  the  third  term  is  the  noise  at  the  correlator  output.  While  the  ultimate  performance  metric 
of  interest  is  the  error  probability,  a  convenient  metric  that  is  easy  to  compute  is  the  signal-to- 
interference-plus-noise  ratio  (SINK)  at  the  output  of  the  linear  correlator,  dehned  as  the  ratio  of 
the  average  energy  of  the  desired  term,  to  those  of  the  undesired  terms: 


SINK 


E  [|6[n]c^uop] 


E 


Y.k^ob[n  +  k]c'^Uk  +  c^w[n]|^ 


(8.15) 


Assuming  that  the  symbols  are  uncorrelated  with  E[|6[n]p]  =  and  are  independent  of  the 
noise,  we  obtain  the  following  expression  for  the  SINK: 


SINR  = 


T  I 

Uo 


cr; 


E 


Ic'^Ufcl 


+  c'^CwC 


(8.16) 


Choosing  c  to  minimize  the  MSE  (8.3)  means  that  we  would  like  to  have  c^r[n]  ~  b[n\.  This 
means  that,  if  the  linear  MMSE  equalizer  is  working,  then  c^uq  ~  1,  and  the  ISI  terms  c^u^. 
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/c  7^  0,  and  the  output  noise  variance  c^CwC,  are  small.  The  MMSE  criterion  represents  a 
tradeoff  between  ISI  and  noise  at  the  output.  To  see  why,  let  us  consider  the  closely  related 
criterion  of  zero-forcing  equalization.  While  the  noise  in  the  example  considered  in  our  code 
fragments  is  colored,  let  us  hrst  consider  white  noise  for  simplicity:  w[n]  ~  iV(0,cr^I),  so  that 
the  output  noise  c^w[n]  ~  iV(0,  cr^|  |c|  p). 


Figure  8.8:  The  zero-forcing  correlator  projects  the  received  signal  along  P/ uq,  the  projection 
of  the  desired  signal  vector  orthogonal  to  the  interference  subspace. 

The  geometry  of  zero-forcing  equalization:  The  zero- forcing  (ZF)  equalizer  is  a  linear 
equalizer  chosen  to  set  the  ISI  terms  at  the  output  exactly  to  zero: 

c^Ufc  =  0  ,  k^O  (8.17) 

while  scaling  the  desired  term  to  the  right  level: 

c^uq  =  1  (8.18) 

The  hrst  condition  (8.17)  means  that  c  must  be  orthogonal  to  the  interference  subspace,  which  is 
our  term  for  the  subspace  spanned  by  the  interference  vectors  {u^,  k  ^  0}.  If  (8.17)  is  satished, 
then  the  second  condition  (8.18)  can  only  be  satished  if  the  desired  vector  uq  does  not  he  in  the 
interference  subspace,  otherwise  we  would  have  c^uq  =  0  (why?).  Thus,  the  zero-forcing  equalizer 
exists  only  if  the  desired  vector  uq  is  linearly  independent  of  the  interference  vectors  {uk,  k  ^  0}, 
in  which  case  it  has  a  nonzero  component  Pj  uq  orthogonal  to  the  interference  subspace,  as  shown 
in  Figure  8.8.  In  this  case,  if  we  choose  c  to  be  a  scalar  multiple  of  this  orthogonal  component, 
then  (8.17)  is  satished  by  construction,  and  (8.18)  can  be  satished  by  choosing  the  scale  factor 
appropriately,  as  discussed  shortly.  And  indeed,  while  the  solution  to  (8.17)  and  (8.18)  need  not 
be  unique,  it  can  be  shown  (see  Problem  8.5)  that  choosing  Czf  =  aP/ uq  is  optimal  (in  terms  of 
minimizing  error  probability  or  maximizing  SNR)  among  all  possible  ZF  solutions,  assuming  that 
the  noise  vector  w[n]  is  white  Gaussian.  As  we  shall  see,  the  performance  of  the  ZF  correlator 
depends  on  the  length  of  this  orthogonal  projection  Pfuo  relative  to  that  of  the  desired  signal 

vector  Uq:  the  smaller  this  relative  length  the  poorer  the  performance. 

For  the  model  (8.10)  for  an  equalizer  of  length  L  =  4,  the  signal  vectors  live  in  a  space  of 
dimension  4,  with  2  interference  vectors.  It  is  quite  clear  that  the  desired  vector  is  indeed 
linearly  independent  of  the  interference  vectors,  and  we  expect  the  ZF  correlator  to  exist.  For 
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the  model  (8.10)  for  L  =  3,  we  again  have  2  interference  vectors,  and  it  again  appears  that 
the  ZF  correlator  should  exist,  although  we  would  expect  the  performance  to  be  poorer  because 
the  relative  length  of  the  orthogonal  projection  can  be  expected  to  be  smaller.  Of  course,  such 
intuition  must  be  quantihed  by  explicit  computation  of  the  ZF  correlator  and  its  performance, 
which  we  discuss  next. 

Computation  of  the  ZF  correlator:  Let  us  now  obtain  an  explicit  expression  for  the  ZF 
correlator  given  the  vector  ISI  model  (8.13).  Suppose  that  the  signal  vectors  {u^}  are  written  as 
columns  in  a  matrix  U  as  follows: 

U  =  [...u_iUoUi...]  (8.19) 

The  ZF  conditions  (8.17)-(8.18)  can  be  compactly  written  as 

\j'^czF  —  G  (8.20) 

where  e  =  (...0, 1,  0,  ...)^  is  a  unit  vector  with  one  corresponding  to  the  column  uq  and  zeros  cor¬ 
responding  to  columns  u^,  fc  7^  0.  Further,  we  can  write  the  ZF  correlator  as  a  linear  combination 
of  the  signal  vectors  (any  component  orthogonal  to  all  of  the  {u^}  can  only  add  noise): 


czF  =  Ua 


(8.21) 


Plugging  into  (8.20),  we  obtain 


U^Ua  =  e 


so  that 

a=  (U^U)"  e 

assuming  invertibility,  which  in  turn  requires  that  the  signal  vectors  {u^}  are  linearly  independent 
(see  Problem  8.6).  Substituting  into  (8.21),  we  obtain  that 


cz^  =  U(U^U)  'e. 


ZF  correlator  for  white  noise 


(8.22) 


Noise  enhancement:  By  “looking”  along  the  direction  of  the  orthogonal  component  Pfuo 
shown  in  Figure  8.8,  the  ZF  equalizer  nulls  out  the  interference  vectors.  When  we  plug  in  (8.17)- 
(8.18),  the  output  SINK  expression  in  (8.16)  reduces  to  the  output  SNR.  Setting  Cw  =  we 
obtain 

2 

SNRzf  =  — [77  ,  ZF  SNR  for  white  noise  (8.23) 

On  the  other  hand,  if  we  ignore  ISI,  then  we  know  from  Chapter  6  that  the  optimal  correlator  in 
AWGN  is  a  scalar  multiple  of  the  desired  vector  uq.  Relative  to  this  “matched  hlter”  solution,  the 
ZF  correlator  incurs  loss  in  SNR,  termed  noise  enhancement.  The  reason  that  we  say  the  noise 
is  getting  enhanced  (as  opposed  to  the  signal  getting  reduced)  is  that,  if  we  scale  the  correlator 
to  keep  the  desired  contribution  at  the  output  constant  as  in  (8.18),  then  the  degradation  in 
SNR  corresponds  to  an  increase  in  the  noise  variance.  For  an  ideal  system  with  no  ISI,  the 
received  vector  is  given  by  r[n]  =  &[n]uo  -|-  w[n].  Setting  c  =  uq,  we  have  c^r[n]  =  6[?7,]||uo|p  -|- 
A^(0,  cr^||uo|p),  from  which  it  is  easy  to  see  that  the  output  SNR  is  given  by 

SNRmf  =  — ^  ^ ^ ^  ° ^ ^  ,  matched  filter  bound  for  white  noise  (8.24) 


This  is  termed  the  matched  filter  (MF)  bound  on  SNR,  and  is  an  unrealizable  (because  we  have 
ignored  ISI)  benchmark  that  we  can  compare  the  performance  of  linear  equalization  strategies 
against.  In  particular,  the  noise  enhancement  (  can  be  dehned  as  the  ratio  by  which  the  ZF  SNR 
is  smaller  than  the  MF  benchmark: 


SNRmf 

SNRzf 


noise  enhancement  for  white  noise 


(8.25) 
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Let  us  first  interpret  this  geometrically.  Setting  Czf  =  oP/ uq,  the  condition  (8.18)  corresponds 
to 

1  =  {czF,  Uo)  =  ^(P/ Uo,  Uo)  =  a|  |Pf Uol  P  (8.26) 

The  last  equality  follows  because  uq  decomposes  into  its  projection  onto  the  interference  sub¬ 
space  P/Uq  and  its  orthogonal  projection  Pf  uq.  Since  these  two  components  are  orthogonal  by 
dehnition,  we  have 

(Pj  Uo,  Uo)  =  (P/ Uo,  P/Uo)  (pf  Uo,  Pj  Uo)  =  0  I  |Pj  Uo|  1^ 


We  see  from  (8.26)  that 


a  = 


1 


Pfuo 


2 


In  other  words,  a  ZF  correlator  satisfying  (8.17)-(8.18)  can  be  written  in  terms  of  the  projection 
of  the  desired  vector  orthogonal  to  the  interference  subspace  as  follows: 


Czf 


(8.27) 


from  which  it  follows  that 


(8.28) 


Thus,  the  smaller  the  orthogonal  projection  P/ uo,  the  more  we  must  scale  up  the  correlator 
in  order  to  maintain  the  normalization  (8.18)  of  the  contribution  of  the  desired  symbol  at  the 
output.  Plugging  into  (8.25),  we  obtain  the  following  geometric  interpretation  for  the  noise 
enhancement: 


SNRmf  

l|uo| 

2 

snRzf  ||: 

O 

3 

— 

Dh 

2 

(8.29) 


This  is  intuitively  reasonable:  the  noise  enhancement  is  the  inverse  of  the  factor  by  which  the  ef¬ 
fective  signal  energy  is  reduced  because  of  looking  along  the  orthogonal  projection  Pf  uq,  instead 
of  along  the  desired  vector  uq. 


The  following  code  fragment  computes  the  ZF  correlator  and  the  noise  enhancement  for  the 
model  (8.10).  We  find  that  the  noise  enhancement  is  4.4  dB. 


Code  Fragment  8.2.2  Computing  the  ZF  solution  and  its  noise  enhancement 
°/„ZF  example 

"/omatrix  with  signal  vectors  as  columns 

U=transpose( [0.5  -0.25  0  0;-0.5  1  0.5  -0.25;0  0  -0.5  1]); 

°/onnit  vector  with  one  corresponding  to  uO 
e=transpose ( [0  10]); 

°/„coeffs  of  linear  combination 
a=(U’*U)\e; 

y„ZF  correlator:  linear  comb  of  cols  of  U 
czf=U*a; 

°/„desired  vector  is  second  column 
u0=U(: ,2); 

y„check  that  ZF  equations  are  satisfied 
U'*czf  ^should  be  equal  to  the  vector  e 
y„noise_enhancement 

noise_enhancement  =  (u0'*u0)*(czf '*czf) 
y„in  dB 

noise_enhancement_db  =  10*logl0(noise_enhancement) 
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While  the  matrix  U  is  specihed  manually  in  the  preceding  code  fragment,  for  longer  channels,  we 
would  typically  automate  the  generation  of  U  given  the  channel  impulse  response  h,  the  equalizer 
length  L,  the  oversampling  factor  q,  and  the  specihcation  of  how  the  observation  interval  lines 
up  with  the  response  to  the  desired  symbol  (i.e.,  how  to  generate  uq  from  h). 

ZF  correlator  for  colored  noise:  Let  us  now  discuss  how  to  generalize  the  expressions  for 
the  ZF  correlator  and  its  noise  enhancement  for  colored  noise,  where  w[?7,]  has  covariance  matrix 
Cw  (assumed  to  be  strictly  positive  dehnite,  and  hence  invertible).  We  limit  ourselves  here  to 
stating  the  results;  guidance  for  deriving  these  results  is  provided  in  Problem  8.9.  The  optimal 
ZF  solution,  in  terms  of  maximizing  the  output  SNR  while  satisfying  (8.17)-(8.18),  is  given  by 

CzF  =  C“^U  (U^C“^U)  ^e,  ZF  correlator  for  colored  noise  (8.30) 


The  corresponding  SNR  is  given  by 


SNRzf  = 


at 


T  ’ 

Czf^'w^ZF 


ZF  SNR  for  colored  noise 


(8.31) 


If  there  were  no  ISI,  then  the  optimal  correlator  is  given  by  the  whitened  matched  hlter  c  =  C.^^^uo, 
and  the  corresponding  matched  hlter  bound  on  SNR  is  given  by 


SNRmf  =  C.^^  uo  ,  matched  filter  bound  for  colored  noise 

Proceeding  as  before,  the  noise  enhancement  is  given  by 
SNRmf 


(8.32) 


c  = 


SNR 


ZF 


=  (uqC.j^^uo)  (c^f^wCzf)  ,  noise  enhancement  for  colored  noise  (8.33) 


The  reader  is  encouraged  to  check  that,  when  we  set  Cw  =  in  the  preceding  expressions,  we 
recover  the  expressions  derived  earlier  for  white  noise. 

MMSE  correlator:  While  we  have  seen  how  to  adaptively  implement  the  MMSE  equalizer,  if 
we  are  given  the  vector  ISI  model  (8.13),  then  we  can  compute  the  MMSE  solution  analytically 
(see  Problem  8.8  for  the  derivation)  as  follows: 


^MMSE  —  R- 


R  =  +Cw  ,  P  =  aluo 


(8.34) 


We  state  without  proof  the  following  results: 

(1)  Among  the  class  of  linear  correlators,  the  MMSE  correlator  is  optimal  in  terms  of  SINR. 
Thus,  it  achieves  the  best  tradeoff  between  the  ISI  and  noise  at  the  output,  attaining  an  SINR 
that  is  better  than  the  SNR  attained  by  the  ZF  correlator  (for  the  ZF  correlator,  the  SINR  equals 
the  SNR,  since  there  is  no  residual  ISI  at  the  output). 

(2)  The  MMSE  correlator  tends  to  the  ZF  correlator  (if  the  latter  exists)  as  the  noise  variance, 
or  more  generally,  the  noise  covariance  matrix,  tends  to  zero.  This  makes  sense:  if  we  can  neglect 
noise,  then  the  MSE  E[|c^r[n]  — can  be  driven  to  zero  by  forcing  the  ISI  to  zero  as  in  (8.17) 
and  by  scaling  the  desired  contribution  according  to  (8.18),  since  we  then  obtain  c^r[n]  =  b[n]. 

We  summarize  as  follows.  The  zero-forcing  equalizer  drives  the  ISI  to  zero,  while  the  linear 
MMSE  equalizer  trades  off  ISI  and  noise  at  its  output  so  as  to  maximize  the  SINR.  For  large 
SNR,  the  contribution  of  the  ISI  is  dominant,  and  the  MMSE  equalizer  tends  in  the  limit  to  the 
zero-forcing  equalizer  (if  it  exists),  and  hence  pays  the  same  asymptotic  penalty  in  terms  of  noise 
enhancement.  In  practice,  the  MMSE  equalizer  often  performs  significantly  better  than  the  ZF 
equalizer  at  moderate  SNRs,  but  in  order  to  improve  its  performance  at  high  SNR,  one  must 
look  to  nonlinear  equalization  strategies,  which  are  beyond  our  present  scope. 
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Extension  to  complex-valued  signals:  All  of  the  preceding  development  applies  to  complex- 
valned  constellations  and  signals,  except  that  vector  transposes  x^  are  replaced  by  conjngate 
transposes  x'^,  and  the  noise  covariance  matrix  must  include  the  effect  of  both  the  real  and 
imaginary  parts  of  the  noise. 

Noise  model:  In  order  to  model  complex-valued  WGN,  we  set  Cw  =  2a^l.  This  can  be  gener¬ 
ated  by  setting  Re(w)  and  Im(w)  to  be  i.i.d.  iV(0,cr^).  More  generally,  we  consider  circularly 
symmetric,  zero  mean,  complex  Gaussian  noise  vectors  w,  which  are  completely  characterized 
by  their  complex  covariance  matrix, 

Cw  =  E  [(w  —  E[w])(w  —  E[w])‘^]  =  E  [ww^] 

We  use  the  notation  w  ~  CiV(0,  Cw)-  Detailed  discussion  of  circularly  symmetric  Gaussian  ran¬ 
dom  vectors  would  distract  us  from  our  present  purpose.  Suffice  it  to  say  that  circular  symmetry 
and  Gaussianity  is  preserved  under  linear  transformations.  The  covariance  matrix  evolves  as 
follows:  if  w  =  Bw,  then  Cw  =  BCwB^.  Thus,  we  can  generate  colored  circularly  symmetric 
Gaussian  noise  w  by  passing  complex  WGN  through  a  linear  transformation.  Specihcally,  if  we 
can  write  Cw  =  BB^  (this  can  always  be  done  for  a  positive  dehnite  matrix),  we  can  generate 
w  as  w  =  Bw,  where  w  ~  CiV(0, 1). 

The  expressions  for  the  ZF  and  MMSE  correlators  are  as  follows: 

CzF  =  Cw^U  (U^C"^U)  ^  e  ,  ZF  correlator  for  complex  —  valued  signals 

cmmse  =  R“^p  ,  MMSE  correlator  for  complex  —  valued  signals 

(R  =  cr^UU^  -F  Cw,  P  =  =  E[|6[n]p]) 

MMSE  and  SINR:  While  the  SINK  for  any  linear  correlator  can  be  computed  as  in  (8.16), 
we  can  obtain  particularly  simple  expressions  for  the  MSE  and  SINR  achieved  by  the  MMSE 
correlator,  as  follows. 


MMSE  =  (Jft  -  p^cmmse  =  CTfo  -  P'^R  V 


SINR 


max 


MMSE 


- 1 


(8.36) 


8.3  Orthogonal  Frequency  Division  Multiplexing 

We  now  introduce  an  alternative  approach  to  communication  over  dispersive  channels  whose  goal 
is  to  isolate  symbols  from  each  other  for  any  dispersive  channel.  The  idea  is  to  employ  frequency 
domain  transmission,  sending  symbols  B[ri\  using  complex  exponentials  Sn{t)  =  which 

have  two  key  properties: 

PI)  When  Sn{t)  goes  through  an  LTI  system  with  impulse  response  h{t)  and  transfer  function 
H{f),  the  output  is  a  scalar  multiple  of  Sn(t).  Specihcally, 

gl27r/„4  ^  ^ 

P2)  Complex  exponentials  at  different  frequencies  are  orthogonal: 

(Sn,  Sm)  =  j  Sn{t)s*^{t)dt  =  j  ^  ^  Q  ^  ^ 

J  J  —CO 

This  is  analogous  to  the  properties  of  eigenvectors  of  matrices.  Thus,  complex  exponentials  are 
eigenfunctions  of  any  LTI  system,  as  already  pointed  out  in  Chapter  2. 
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Conceptual  basis  for  OFDM:  For  frequency  domain  transmission  with  symbol  B[k]  modu¬ 
lating  the  complex  exponential  Snit)  =  -j-pg  transmitted  signal  is  given  by 

u{t)  = 

n 

When  this  goes  through  a  dispersive  channel  h(t),  we  obtain  (ignoring  noise) 

n 

Note  that  the  symbols  {5[n]}  do  not  interfere  with  each  other  after  passing  through  the  channels, 
since  different  complex  exponentials  are  orthogonal.  Furthermore,  regardless  of  how  complicated 
the  time  domain  channel  h{t)  is,  we  have  managed  to  parallelize  the  problem  of  equalization  by 
going  to  the  frequency  domain.  Thus,  we  only  need  to  estimate  and  compensate  for  the  complex 
scalar  H{fn)  in  demodulating  the  nth  symbol.  We  now  discuss  how  to  translate  this  concept 
into  practice. 

Finite  signaling  interval:  The  hrst  step  is  to  constrain  the  signaling  interval,  say  to  length  T. 
The  complex  baseband  transmitted  signal  is  therefore  given  by 


N-l  N-1 

u{t)  =  ^  B[n]pn{t)  (8.37) 

71=0  71=0 

where  B[n\  is  the  symbol  transmitted  using  the  modulating  signal  Pn(t)  =  using  the 

nth  subcarrier  at  frequency  fn-  Let  us  now  see  how  the  properties  PI  and  P2  are  affected  by 
time  limiting.  The  time  limited  tone  Pn{t)  has  Fourier  transform  Pn{f)  =  L"sinc((/  — /„)T)e“’^'^^, 
which  decays  quickly  as  \  f  —  fn\  takes  on  values  of  the  order  of  L.  For  a  channel  whose  impulse 
response  h(t)  is  approximately  timelimited  to  (the  channel  delay  spread),  the  transfer  function 
is  approximately  constant  over  frequency  intervals  of  length  Be  roughly  inversely  proportional 
to  1/Trf  (the  channel  coherence  bandwidth).  If  the  signaling  interval  is  large  compared  to  the 
channel  delay  spread  (T  3>  T^,  then  1/T  is  small  compared  to  the  channel  coherence  bandwidth 
(L  <C  Be),  so  that  the  gain  seen  by  Pn{f)  is  roughly  constant,  and  the  eigenfunction  property 
is  roughly  preserved.  That  is,  when  Pn{f)  goes  through  a  channel  with  transfer  function  H{f), 
the  output 

Q„(/)  =  H{f)PM  ^  H{fr,)PM)  (8.38) 

Regarding  the  orthogonality  property  P2,  two  complex  exponentials  that  are  constrained  to  an 
interval  of  length  T  are  orthogonal  if  the  frequency  separation  is  an  integer  multiple  of  1/T: 

/  = - — - — —  =  0  ,  for  (/„  —  fm)T  =  nonzero  integer  (8.39) 

Jo  j27r(/„  -  /™) 

Thus,  if  we  wish  to  send  N  symbols  in  parallel  using  N  subcarriers  (the  term  used  for  each  time- 
constrained  complex  exponential),  we  need  a  bandwidth  of  roughly  N/T  in  order  to  preserve 
orthogonality  among  the  timelimited  tones.  Of  course,  even  if  we  enforce  orthogonality  in  this 
fashion,  the  timelimited  tones  are  not  eigenfunctions  of  LTI  systems,  so  the  output  corresponding 
to  the  nth  timelimited  tone  is  not  just  a  scalar  multiple  of  itself.  However,  using  (8.38),  we  can 
approximate  the  channel  output  for  the  nth  timelimited  tone  as  H {fn)eP'^P^ I[o^T](t) ■  Thus,  the 
output  corresponding  to  the  transmitted  signal  (8.37)  can  be  approximated  as  follows: 


N-l 

y(t)  «  Y.  +  n(t) 

77=0 


(8.40) 
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To  summarize,  once  we  limit  the  signaling  duration  to  be  finite,  the  ISI  avoidance  property  of 
OFDM  is  approximate  rather  than  exact.  However,  as  we  now  discuss,  orthogonality  between 
subcarriers  can  be  restored  exactly  in  digital  implementations  of  OFDM.  Before  discussing  such 
implementations,  we  provide  some  background  on  discrete  time  signal  processing. 


8.3.1  DSP-centric  implementation 

The  proliferation  of  OFDM  in  commercial  systems  (including  wireline  DSL,  wireless  local  area 
networks  and  wireless  cellular  systems)  has  been  enabled  by  the  implementation  of  its  transceiver 
functionalities  in  DSP,  which  leverages  the  economies  of  scale  of  digital  computation  (Moore’s 
“law”).  For  T  large  enough,  the  bandwidth  of  the  OFDM  signal  u  is  approximately  N/T,  where 
N  denotes  the  number  of  subcarriers.  Thus,  we  can  represent  u{t)  accurately  by  sampling  at 
rate  l/Tg  =  N/T,  where  denotes  the  sampling  interval.  From  (8.37),  the  samples  are  given  by 


N-l 

u{kTs)  = 

n=0 

We  can  recognize  this  simply  as  the  inverse  DFT  of  the  symbol  sequence  We  make  this 

explicit  in  the  notation  as  follows: 


Af-l 

b[k]  =  u{kTs)  =  Y  (8.41) 

n=0 

If  TV  is  a  power  of  2  (which  can  be  achieved  by  zeropadding  if  necessary),  the  samples  can 

be  efficiently  generated  from  the  symbols  using  an  inverse  Fast  Fourier  Transform  (IFFT). 

The  complex  baseband  waveform  u{t)  can  now  be  obtained  from  its  samples  by  digital-to-analog 
(D/A)  conversion.  This  implementation  of  an  OFDM  transmitter  is  as  shown  in  Figure  8.9:  the 
bits  are  mapped  to  symbols,  the  symbols  are  fed  in  parallel  to  the  inverse  FFT  (IFFT)  block, 
and  the  complex  baseband  signal  is  obtained  by  D/A  conversion  of  the  samples  (after  insertion 
of  a  cyclic  prefix,  to  be  discussed  after  we  motivate  it  in  the  context  of  receiver  implementation). 
Typically,  the  D/A  converter  is  an  interpolating  filter,  so  that  its  effect  can  be  subsumed  within 
the  channel  impulse  response. 


Bits  Modulator 
from  (bits  to  symbols) 
encoder - 


N  complex  N  complex 
symbols  in  samples  out 


Figure  8.9:  DSP-centric  implementation  of  an  OFDM  transmitter. 


Note  that  the  relation  (8.41)  can  be  inverted  as  follows: 


B[n] 


1 

N 


N-l 

/c=0 


(8.42) 


This  is  exploited  in  the  digital  implementation  of  the  OFDM  receiver,  discussed  next. 
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Remark  on  Matlah  FFT  and  IFFT  conventions:  Matlab  puts  a  factor  of  1/N  in  the  IFFT  rather 
than  in  the  FFT  as  done  in  (8.41)  and  (8.42).  In  both  cases,  however,  IFFT  followed  by  FFT 
gives  the  identity.  Note  also  that  Matlab  numbers  vector  entries  starting  with  one,  so  the  FFT 
of  x[n],n  =  l,...,iV  is  given  by: 


N 

x[k]  = 

n=l 

The  corresponding  IFFT  is  given  by 

1  ^ 

x[n]  = 

'  fc=i 

We  have  observed  that,  once  we  limit  the  signaling  duration  to  be  hnite,  the  ISI  avoidance 
property  of  OFDM  is  approximate  rather  than  exact.  However,  as  we  now  show,  orthogonality 
between  subcarriers  can  be  restored  exactly  in  discrete  time  by  using  a  cyclic  prehx,  which  allows 
for  efficient  demodulation  using  an  FFT.  The  noiseless  received  OFDM  signal  is  modeled  as 


N-l 

n(t)  =  ^  b[k]p{t  -  kTs) 

k=0 

where  the  “effective”  channel  impulse  response  p{t)  includes  the  effect  of  the  D/A  converter  at 
the  transmitter,  the  physical  channel,  and  the  receive  hlter.  When  we  sample  this  signal  at  rate 
l/Tg,  we  obtain  the  discrete-time  model 


Af-l 

v[m]  =  E  b[k]h[m  —  k]  (8.43) 

k=0 

where  {h[l]  =  p{lTs)}  is  the  effective  discrete  time  channel  of  length  L,  assumed  to  be  smaller 
than  N.  We  assume,  without  loss  of  generality,  that  h[l]  =  0  for  I  <  0  and  I  >  L.  We  can  rewrite 
(8.43)  as 

L-l 

v[m]  =  E  h[l]b[m  —  1]  (8.44) 

1=0 

Let  F[  denote  the  N  point  DFT  of  h,  where  N  >  L. 


N-l  L-l 

H[n]  =  ^  (8.45) 

1=0  1=0 

As  noted  in  (8.42),  the  DFT  of  {^[fc]}  is  the  symbol  sequence  B[n]  (the  normalization  is  chosen 
differently  in  (8.42)  and  (8.45)  to  simplify  the  forthcoming  equations.)  In  order  to  parallelize 
equalization  across  the  N  subcarriers,  we  would  like  the  noiseless  signal  to  equal  V[n]  =  FI[n]B[n]. 
However,  this  is  not  quite  satished  in  our  setting.  We  now  discuss  why  not,  and  how  to  modify  the 
system  so  as  to  indeed  enforce  such  a  relationship.  Before  doing  this,  we  need  a  brief  discussion 
of  the  DFT  and  its  dual  operation,  the  cyclic  convolution. 

DFT  multiplication  and  cyclic  convolution:  The  time  domain  samples  {5[fc]}  dehned  via 
the  IDFT  in  (8.41)  have  range  0  <  k  <  N  —  1.  If  we  now  plug  in  integer  values  of  k  outside 
this  range,  we  simply  get  a  periodic  extension  b]s[[k]  of  these  samples  with  period  N,  satisfying 
bN[k  -|-  A^]  =  b^lk]  for  all  k,  with  b^lk]  =  b[k],  0  <  k  <  N  —  1.  Thus,  the  IDFT  can  be  viewed  as 
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a  discrete  time  analogue  of  a  Fourier  series  for  a  periodic  time  domain  sample  sequence  {bjs[[k]}. 
We  know  that,  for  the  Fourier  transform,  “multiplication  in  the  frequency  domain  corresponds  to 
convolution  in  the  time  domain.”  We  skipped  the  analogous  result  for  Fourier  series  in  Chapter 
2  because  we  did  not  need  it  then.  Now,  however,  we  establish  the  appropriate  result  for  the 
discrete  time  Fourier  series  of  interest  here:  for  the  DFT,  if  we  multiply  two  sequences  in  the 
frequency  domain,  then  it  corresponds  to  a  cyclic,  or  periodic,  convolution  in  the  time  domain. 

While  the  result  we  wish  to  establish  is  general,  let  us  stick  with  the  notation  we  have  already 
established.  Consider  the  “desired”  sequence  V[n]  =  H[n]B[n],  n  =  0,  ...,N  —  1,  that  we  would 
like  to  get  when  we  take  the  DFT  of  the  output  of  the  channel.  What  is  the  corresponding  time 
domain  sequence?  To  see  this,  take  the  IDFT: 


N-l 

v[m\  =  E 

n=0 


Plugging  in  the  expression  (8.45)  for  the  channel  DFT  coefficients,  we  obtain 


V 


N  =  En=o  Eto' 


(8.46) 


Now,  the  summation  over  n  corresponds  to  an  IDFT,  and  therefore  gives  us  h[m  —  /]  as  long  as 
0<m  —  /  <iV  —  1.  Outside  this  range,  it  gives  us  the  periodic  extension  {hj^[m  —  /]}: 


N-l 

=  hN[m-  1] 

n=0 


Thus,  we  can  write  (8.46)  as 


L-l  L-l 

v[m]  =  'Y  h[/]5Ar[m  —  1]  =  Yj  h[l]b[{m  —  1)  mod  A^]  =  (h  0  b)[m]  (8.47) 

1=0  1=0 

where  we  have  introduced  the  notation  hQb  to  denote  the  cyclic  convolution  of  h  and  b.  While 
we  have  derived  this  result  in  our  particular  context,  it  is  worth  stating  that  it  holds  generally: 
the  cyclic  convolution  modulo  N  between  two  sequences  p  and  g,  each  of  length  at  most  N  (it  is 
often  convenient  to  think  of  them  as  having  length  N ,  using  zeropadding  if  necessary)  is  defined 
as  the  convolution  over  a  period  of  length  N  of  their  periodic  extensions  with  period  N\ 


N-l 

(p0  g)[m]  =  ^PAr[/]gv(m  -  1) 

1=0 

The  N  point  DFT  of  the  cyclic  convolution  of  these  two  sequences  is  the  product  of  their  DFTs. 

Figure  8.10  illustrates  cyclic  convolution  modulo  iV  =  4  between  a  sample  sequence  {&[/c]}  of 
length  4  and  a  channel  impulse  response  of  length  2,  while  Figure  8.11  illustrates  the  correspond¬ 
ing  linear  convolution. 

Let  us  summarize  where  we  now  stand.  In  order  to  parallelize  the  channel  in  the  DFT  domain,  we 
need  a  cyclic  convolution  in  the  time  domain  given  by  (8.47).  However,  what  the  physical  channel 
actually  gives  us  is  the  linear  convolution  of  the  form  (8.44).  In  order  to  get  the  cyclic  convolution 
we  want,  we  simply  need  to  send  an  appropriately  large  segment  of  a  periodic  extension  of  the 
time  domain  samples  {^[/c]}  through  the  channel.  Indeed,  if  we  only  want  to  get  N  outputs 
corresponding  a  single  period  of  the  output  of  the  circular  convolution,  then  we  do  not  need  a 
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Flip  and  slide  sample  sequence  {b[k]}  on  a  circle 


b[31 

v[0]  =  b[0]  h[0]  +  b[3]h[l] 


b[01 

v[l]=b[l]h[0]  +  b[0]  h[l] 


b[ll  b[2] 

v[2]  =  b[2]  h[0]  +  b[l]  h[l]  v[3]  =  b[3]  h[0]  +  b[2]  h[l] 


Cyclic  convolution  outputs 


Figure  8.10:  Example  of  cyclic  convolution.  Time  progresses  clockwise  on  the  circle.  The  se¬ 
quence  is  flipped,  and  hence  goes  counter-clockwise.  We  then  “slide”  this  flipped  sequence 

clockwise  in  order  to  compute  successive  outputs.  Clearly,  the  output  is  periodic  with  period 
N=4:. 


Flip  and  slide  sample  sequence{b[k]} 

Linear  convolution  output 

b[3]  b[2] 

•  • 

b[l] 

• 

•i 

v[0]  =  b[0]  h[0] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[l]=b[l]h[0]+b[0]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[2]  =  b[2]  h[0]+b[l]h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[3]  =  b[3]  h[0]+b[2]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

MO]  v[4]  =  b[3]  h[l] 

• 

^0] 

• 

h[lj^ 

Channel  impulse  response  {h[k]} 


Figure  8.11:  Linear  convolution  of  the  two  sequences  in  Figure  8.10  leads  to  an  aperiodic  sequence 
of  length  2  -|-  4  —  1  =  5.  Note  that  the  outputs  at  times  1,2  and  3  coincide  with  the  outputs  of 
the  corresponding  cyclic  convolution. 
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Flip  and  slide  periodic  extension  of  sample  sequence  {b[k]) 


Dnly  need  this  extra  sample  to  ensure 
complete  overlap  with  channel  coefficients 


b[2] 

• 

b[l] 

• 

b[0K 

•  V 

^  b[3] 

\b[2] 

/  • 

b[l] 

• 

b[0]\ 

v[0]  =  b[0]  h[0]  +  b[3]  h[l] 

Linear  convolution  with  cyclic  prefix 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[l]  =  b[l]h[0]  +  b[0]  h[l] 

Lineal"  convolution 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[2]  =  b[2]  h[0]  +  b[l]h[l] 

coincides  with  cyclic  convoluaion 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[3]  =  b[3]  h[0]+b[2]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[4]  =  b[31  h[l] 

Does  not  coincide 

with  circular  convolution 

• 

• 

h[l]^ 

Channel  impulse  response  { h[k] ) 


Figure  8.12:  Using  linear  convolution  to  emulate  a  circular  convolution. 


full-fledged  periodic  extension.  Figure  8.12  shows  how  to  get  the  hrst  iV  =  4  outputs  of  a  linear 
convolution  to  be  equal  to  a  period  of  the  cyclic  convolution  in  Figure  8.10  by  inserting  a  single 
sample.  More  generally,  we  need  a  cyclic  prefix  of  length  L  —  1  for  a  channel  of  length  L,  as 
discussed  below. 

Since  L  <  N,  we  can  write  the  circular  convolution  (8.47)  as 


min(L— l,m)  L—1 

v[m]  =  E  h[l]b[m  —  /]  -t-  ^  h[l]b[m  —  I  +  N]  (8.48) 

Comparing  the  linear  convolution  (8.47)  and  the  cyclic  convolution  (8.48),  we  see  that  they  are 
identical  except  when  the  index  m  —  I  takes  negative  values:  in  this  case,  b[m  —  1]  =  0  in  the 
linear  convolution,  while  b[{m  —  l)  mod  N]  =  bN{m  —  l)  =  b[m  —  l  +  N]  contributes  to  the  circular 
convolution.  Thus,  we  can  emulate  a  cyclic  convolution  using  the  physical  linear  convolution  by 
sending  a  cyclic  prefix]  that  is,  by  sending 

b[k]  =  bN[k]  =b[N  +  k],  k  =  -{L  -  1), -{L  -  2), ...,  -1 

before  we  send  the  samples  6[0], ...,  b[N  —  1].  That  is,  we  transmit  the  samples 


b[N-L  +  l], ...,  b[N  -  1],  6[0], ...,  b[N  -  1] 


incurring  an  overhead  of  {L  —  1)/N  which  can  be  made  small  by  choosing  N  to  be  large. 

In  the  example  depicted  in  Figure  8.12,  iV  =  4  and  L  =  2,  and  we  insert  the  cyclic  prehx  6[3], 
sending  6[3],  6[0],  6[1],  b[2],  6[3]  (when  this  is  flipped  for  the  pictorial  convolution  in  the  hgure,  the 
extra  sample  6[3]  appears  at  the  end). 

At  the  receiver,  the  complex  baseband  signal  is  sampled  at  rate  l/Tg  to  obtain  noisy  versions  of 
the  samples  {^[fc]}.  The  FFT  of  these  samples  then  yields  the  model 

Y[n]  =  H[n]B[n]  +  N[n]  (8.49) 

where  the  frequency  domain  noise  samples  are  modeled  as  i.i.d.  complex  Gaussian,  with 
Re(A^[?7,])  and  Im(iV[?7,])  being  i.i.d.  iV(0,(T^).  If  the  receiver  knows  the  channel,  then  it  can 
implement  ML  reception  based  on  the  statistic  H*[n]Y[n].  Thus,  the  task  of  channel  equalization 
has  been  reduced  to  compensating  for  scalar  channel  gains  for  each  subcarrier.  This  makes 
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Figure  8.13:  DSP-centric  implementation  of  an  OFDM  receiver.  Carrier  and  timing  synchro¬ 
nization  blocks  are  not  shown. 


OFDM  extremely  attractive  for  highly  dispersive  channels,  for  which  time  domain  singlecarrier 
equalization  strategies  would  be  difficult  to  implement. 

Channel  estimation:  Channel  estimation  (along  with  timing  and  carrier  synchronization, 
which  are  not  considered  here)  is  accomplished  by  sending  pilot  symbols.  In  Software  Lab  8.2, 
we  send  an  entire  OFDM  symbol  as  a  pilot,  followed  by  a  succession  of  other  OFDM  symbols 
with  payload. 


8.4  MIMO 

The  term  Multiple  Input  Multiple  Output  (MIMO),  or  space-time  communication,  refers  to  com¬ 
munication  systems  employing  multiple  antennas  at  the  transmitter  and  receiver.  We  now  pro¬ 
vide  a  brief  introduction  to  key  concepts  in  MIMO  systems,  along  with  pointers  for  further 
exploration. 

While  much  effort  and  expertise  must  go  into  the  design  of  antennas  and  their  interface  to 
RF  circuits,  the  following  abstract  view  suffices  for  our  purpose  here:  at  the  transmitter,  an 
antenna  transduces  electrical  signals  at  radio  frequencies  into  electromagnetic  waves  at  the  same 
frequency  that  propagate  in  space;  at  the  receiver,  the  antenna  transduces  electromagnetic  waves 
in  a  certain  frequency  range  into  electrical  signals  at  the  same  set  of  frequencies.  Antennas  which 
are  insensitive  to  the  direction  of  arrival/departure  of  the  waves  are  termed  omnidirectional  or 
isotropic  (while  there  is  no  such  thing  as  an  ideal  isotropic  antenna,  it  is  a  convenient  conceptual 
building  block).  Antennas  which  are  sensitive  to  the  direction  of  arrival  or  departure  are  termed 
directional.  It  is  possible  to  synthesize  directional  responses  using  an  array  of  omnidirectional 
antenna  elements,  as  we  discuss  next. 


8.4.1  The  linear  array 

Consider  a  plane  wave  impinging  on  the  uniformly  spaced  linear  array  shown  in  Figure  8.14. 
We  see  that  the  wave  sees  slightly  different  path  lengths,  and  hence  different  phase  shifts,  in 
reaching  different  antenna  elements.  The  path  length  difference  between  two  successive  elements 
is  given  hy  i  =  dsinO,  where  d  is  the  inter-element  spacing,  and  6  the  angle  of  arrival  (AoA) 
relative  to  the  broadside.  The  corresponding  phase  shift  across  successive  elements  is  given 
by  0  =  27ri/X  =  27rdsm6/X,  where  A  denotes  the  wavelength.  Another  way  to  get  the  same 
result:  the  delay  difference  between  successive  elements  is  r  =  £/c,  where  c  is  the  speed  of  wave 
propagation  (equal  to  3  x  10®  m/s  in  free  space).  For  carrier  frequency  fc,  the  corresponding 
phase  shift  is  0  =  27r fcT  =  27r f^d  sin  6/ c.  The  two  expressions  are  equivalent,  since  X  =  y. 

The  narrowband  assumption:  What  is  the  effect  of  the  differences  in  delays  seen  by  successive 
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Figure  8.14:  A  plane  wave  impinging  on  a  linear  array. 


elements?  Suppose  that  the  wave  impinging  on  element  1  is  represented  as 
Up{t)  =  Uc{t)  cos27r/ct  —  Us{t)  sin27r/ct  =  Re 

where  u(t)  =  udt)  +  jusit)  is  the  complex  envelope,  assumed  to  be  of  bandwidth  W.  Suppose 
that  the  bandwidth  W  <C  fc'-  this  is  the  so-called  “narrowband  assumption,”  which  typically 
holds  in  most  practical  settings.  For  the  scenario  shown  in  the  hgure,  the  wave  arrives  r  =  Ijc 
time  units  earlier  at  element  2.  The  wave  impinging  on  element  2  can  therefore  be  represented 
as 


Vp(t)  =  Ucit  +  t)  cos  2%  fc{t  +  t)  —  Us{t  +  t)  sin27r/c(t  -|-  r)  =  Re  {u{t  + 

where  0  =  271  fcT.  Thus,  the  complex  envelope  of  the  wave  at  element  2  is  v{t)  =  u{t  -|- 
Thus,  the  time  shift  r  has  two  effects  on  the  complex  envelope:  a  time  shift  in  the  baseband 
waveform  u,  along  with  a  phase  rotation  0  due  to  the  carrier.  However,  for  most  settings  of 
interest,  the  time  shift  in  the  baseband  waveform  can  be  ignored.  To  see  why,  suppose  that  the 
array  parameters  are  such  that  0  is  of  the  order  of  27r  or  less,  in  which  case  r  is  of  the  order  of 
-h  or  less.  Under  the  narrowband  assumption,  the  time  shift  r  produces  little  distortion  in  u.  To 

Jc 

see  this,  note  that 

u{t  +  r)  O 

As  /  varies  over  a  range  W,  the  frequency-dependent  phase  change  produced  by  the  time  shift 
varies  over  a  range  27rhFr  ~  27rW/  fc  -C  27r  for  W  <C  fc-  Thus,  we  can  ignore  the  effect  of  the  time 
shift  on  the  complex  envelope,  and  model  the  complex  envelope  at  element  2  as  v(t)  ~  u(t)e^'^. 
Similarly,  for  element  3,  the  complex  envelope  is  well  approximated  as 

Array  response  and  spatial  frequency:  Under  the  narrowband  assumption,  if  the  complex  envelope 
at  element  1  is  u{t),  then  the  complex  envelopes  at  the  various  elements  can  be  collected  into  a 
vector  u{t)sL,  where 

a  =  (1,  e^'^,  ...,  (8.50) 

is  the  array  response  for  a  particular  AoA.  Making  the  dependence  on  AoA  6  explicit  for  the  linear 
array,  we  have  0(0)  =  27rdsm6/ X,  which  yields  a  corresponding  array  response  a(0).  The  linear 
increase  in  phase  across  antenna  elements  (i.e,  across  space)  is  analogous  to  the  linear  increase 
of  phase  across  time  for  a  sinusoid.  Thus,  we  call  0  =  0(0)  the  spatial  frequency  corresponding 
to  AoA  0.  The  collection  of  array  responses  {a(0),  0  G  [— tt,  tt]}  as  we  vary  the  AoA  is  termed  the 
array  manifold. 


414 


Reciprocity:  While  Figure  8.14  depicts  an  antenna  array  receiving  a  wave,  exactly  the  same 
reasoning  applies  to  an  antenna  array  emitting  a  wave.  In  particular,  the  principle  of  reciprocity 
tells  us  that  the  propagation  channel  from  transmitter  to  receiver  is  the  same  as  that  from  receiver 
to  transmitter.  Thus,  the  array  response  of  a  linear  array  for  angle  of  arrival  9  is  the  same  as 
the  array  response  for  angle  of  departure  9. 


Antenna  1 


Antenna  N 


Common  LO 
for  downconversion 


Figure  8.15:  MIMO  signal  processing  architecture.  There  is  one  “RF  chain”  per  antenna,  down¬ 
converting  the  signal  received  at  that  antenna  to  I  and  Q  components. 


Signal  processing  architecture:  What  the  preceding  complex  baseband  model  means  physically 
is  that,  if  we  downconvert  the  RF  signals  at  the  outputs  of  the  antenna  elements  (using  the 
same  LO  frequency  and  phase,  and  Liters  with  identical  responses,  in  each  such  “RF  chain”), 
then  the  complex  envelopes  corresponding  to  the  different  antenna  elements  will  be  related  as 
described  above.  Once  the  I  and  Q  components  for  these  complex  envelopes  are  obtained, 
they  would  typically  be  sampled  and  quantized  using  analog-to-digital  converters  (ADCs),  and 
then  processed  digitally.  Such  a  DSP-centric  signal  processing  architecture,  depicted  in  Figure 
8.15,  allows  the  implementation  of  sophisticated  MIMO  algorithms  in  today’s  cellular  and  WiFi 
systems.  While  the  figure  depicts  a  receiver  architecture,  an  entirely  analogous  block  diagram  can 
be  drawn  for  a  MIMO  transmitter,  simply  by  reversing  the  arrows  and  replacing  downconverters 
by  upconverters. 

While  the  DSP-centric  architecture  depicted  in  Figure  8.15  has  been  key  to  enabling  the  widespread 
deployment  of  low-cost  MIMO  transceivers,  it  may  need  to  be  revisited  as  carrier  frequencies, 
and  the  available  signaling  bandwidths,  scale  up.  Both  the  cost  and  power  consumption  of  ADCs 
with  adequate  precision  can  be  prohibitively  large  at  very  high  sampling  rates,  hence  alternative 
architectures  with  MIMO  processing  done,  wholly  or  in  part,  prior  to  ADC  may  need  to  be 
considered.  See  the  epilogue  for  further  discussion. 


8.4.2  Beamsteering 

Once  we  know  the  array  response  for  a  given  direction,  we  can  maximize  the  received  power 
(for  a  receive  antenna  array)  or  the  transmitted  power  (for  a  transmit  antenna  array)  in  that 
direction  by  employing  a  spatial  matched  filter  or  spatial  correlator.  If  the  hrst  antenna  element 
receives  a  complex  baseband  waveform  (after  downconversion  and  sampling)  s[?7,]  from  AoA  9, 
then  the  output  of  the  antenna  array  is  modeled  as  a  vector  of  complex  baseband  discrete  time 
signals  with  fcth  component 

yk[n]  =  +Wk[n]  ,  k  =  l,2,...,N  (8.51) 
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where  0(0)  is  the  spatial  frequency  corresponding  to  9,  and  where  Wk[n\  are  typically  modeled 
as  complex  WGN,  independent  across  space  and  time:  Re(tCfc[n])  and  Im(tafc[n])  i.i.d.  iV(0,(T^) 
for  all  /c,  n.  In  vector  notation,  we  can  write 

y[n]  =  a(0)s[?7,]  +  w[?7,]  (8.52) 

where  y[n]  =  (?/i[n], |/Ar[n])^,  w[?7,]  =  {wi[n\,  ...,WN[n])'^ ,  and  a(9)  is  the  array  response  cor¬ 
responding  to  direction  9.  We  have  not  discussed  complex  WGN  in  detail  in  this  text,  but  in 
analogy  with  the  results  in  Ghapters  5  and  6  for  real  WGN,  it  is  possible  to  show  that  correlation 
against  a  noiseless  signal  template  is  the  right  thing  to  do.  Thus,  regardless  of  the  value  of  the 
time  domain  sample  s[n],  the  spatial  processing  that  maximizes  SNR  is  to  correlate  against  the 
noiseless  template  a(0).  That  is,  we  wish  to  compute  the  decision  statistics 

Z[n]  =  (y[n],  a(0))  =  a^(0)y  [n]  (8.53) 

Gorrelating  the  spatial  signal  against  the  array  response  in  this  fashion  is  termed  beamform¬ 
ing.  The  desired  signal  contribution  to  the  decision  statistic  obtained  from  beamforming  is 
I  |a(0) I  ps[n]  =  iVs[n].  Thus,  the  signal  amplitude  gets  scaled  by  a  factor  of  N,  and  hence  the 
signal  power  gets  scaled  by  a  factor  It  can  be  shown  that  the  variance  of  the  noise  contri¬ 
bution  to  the  decision  statistic  gets  amplihed  by  a  factor  of  N.  Thus,  the  SNR  gets  amplihed 
by  a  factor  of  N  by  beamforming  at  the  receiver.  This  is  called  the  beamforming  gain.  Receive 
beamforming  is  also  termed  maximal  ratio  combining,  because  it  combines  the  spatial  signal  in 
a  manner  that  maximizes  the  signal-to-noise  ratio. 

Receive  beamforming  gathers  energy  coming  from  a  given  direction.  Gonversely,  transmit  beam¬ 
forming  can  be  used  to  direct  energy  in  a  given  direction.  For  example,  if  a  linear  transmit 
antenna  array  seeks  to  direct  energy  towards  an  angle  of  departure  9,  then,  in  order  to  send 
a  time  domain  samples  s[?7,],  it  should  transmit  the  spatial  vector  s[n]a^(0).  Since  the  spatial 
channel  to  the  receiver  is  a(0),  the  signal  received  is  given  by  s[n]a^(0)a(0)  =  iVs[n].  Thus,  the 
received  amplitude  scales  as  N,  and  the  received  power  as  iV^.  Since  the  noise  at  the  receiver  does 
not  get  the  beneht  of  this  transmit  beamforming  gain,  transmit  beamforming  with  N  antennas 
leads  to  an  SNR  gain  of  relative  to  a  single  antenna  system,  if  we  fix  the  per-antenna  emitted 
power.  The  signal  transmitted  from  antenna  k  is  which  has  power  |s[n]p,  and 

since  we  have  N  antenna  elements,  we  are  transmitting  at  N  times  the  power.  The  additional 
factor  of  N  in  received  power  comes  from  the  fact  that,  by  choosing  the  beamforming  coefficients 
appropriately,  we  are  ensuring  that  the  signals  from  these  N  antenna  elements  add  up  in  phase 
at  the  receiver,  which  leads  to  an  iV-fold  gain. 

Thus,  both  transmit  and  receive  beamforming  perform  spatial  matched  filtering,  leading  to  a 
beamforming  gain  of  N.  That  is,  the  SNR  is  enhanced  by  a  factor  of  N.  In  addition,  if  each 
element  in  a  transmit  antenna  array  transmits  at  a  power  equal  to  that  of  a  reference  single  ele¬ 
ment  antenna,  then  we  have  an  additional  power  combining  gain  of  N  for  transmit  beamforming, 
leading  to  a  net  SNR  gain  of  N'^. 

Beamforming  directs  energy  in  a  given  direction  by  ensuring  that  the  radio  waves  emitted  or 
received  from  that  direction  (or  their  complex  envelopes)  add  constructively,  or  in  phase.  The 
radio  waves  in  other  directions  may  add  constructively  or  destructively,  depend  on  the  array 
geometry.  Thus,  it  is  of  interest  to  characterize  the  beam  pattern  corresponding  to  a  particular 
set  of  beamforming  coefficients.  If  we  are  beamforming  in  direction  9q,  then  the  gain  in  an 
arbitrary  direction  9  is  given  by 

G(0;0o)  =  |(a(0),a(0o))|  =  |a^(0o)a(0)| 

The  following  code  fragment  computes  and  plots  the  beam  pattern  for  a  linear  array. 

Code  Fragment  8.4.1  Plotting  beam  patterns  for  a  linear  array 
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d=l/3 ; %normalized  inter-element  spacing 
N=10;  Znumber  of  array  elements 

thetaO_degrees  =  0;  "/odesired  angle  from  broadside  in  degrees 
theta0=theta0_degrees*pi/180;  %desired  angle  from  broadside  in  radians 
phi0=2*pi*d*sin(theta0) ;  %desired  spatial  frequency 
a0=  exp(j*phi0* [0:N-1] ) ;  %array  response  in  desired  direction 
theta_degrees  =  -90:1:90;  "/osweep  of  angles  with  respect  to  broadside 
theta  =  theta_degrees*pi/180;  % (angles  in  radians) 

phi  =  2*pi*d*sin(theta) ;  %spatial  freqs  as  a  function  of  angle  wrt  broadside  (as  a  row) 
7„array  responses  corresponding  to  the  spatial  freqs  as  columns 
array_responses  =  exp(j*transpose( [0:N-1] )*phi) ; 

7oinner  product  of  desired  array  response  with  array  responses  in  other  directions 
rho  =  conj (aO)*array_responses; 

plot  (theta_degrees ,  10*logl0(abs  (rho) ) ) ;  /(plot  gain  (dB)  versus  angle 
hold  on; 

stem(thetaO_degrees , 10*logl0(N) , ^r ’ ) ;  Zindicates  desired  direction 
xlabeK ’Angle  with  respect  to  broadside’); 
ylabeK’Gain  (dB)’); 


Figure  8.16:  Example  beam  patterns  with  a  linear  array,  generated  using  code  fragment  8.4.1. 

Array  spacing:  In  the  preceding  code  fragment,  we  have  set  the  element  spacing  at  A/3.  In 
Problem  8.10,  we  explore  the  effect  of  varying  the  element  spacing,  and  in  particular,  what 
happens  as  the  element  spacing  exceeds  A/2. 

Notational  convention:  We  say  that  we  employ  beamforming  weights  or  coefficients  c  =  (ci, ...,  cat)^ 
when  we  apply  the  coefficient  c*  to  the  ith  antenna  element.  For  a  receive  beamformer,  if  the 
spatial  signal  being  received  is  y  =  {yi,  ...,?/7v)^,  then  the  use  of  beamforming  weights  c  corre¬ 
sponds  to  computing  the  inner  product  (y,c)  =  c^y  =  With  this  convention,  the 

beamforming  weights  for  directing  a  beam  in  direction  6  are  given  by  c  =  a{d). 

Steering  nulls:  As  we  see  from  Figure  8.16,  when  we  form  a  beam  in  a  given  direction,  we 
maximize  the  beam  pattern  in  that  direction,  creating  a  main  lobe  in  the  beam  pattern,  while 
also  generating  other  local  maxima  (typically  of  lower  strength)  in  other  directions.  The  latter 
are  called  sidelohes,  and  are  often  small  enough  compared  to  the  main  lobe  that  we  do  not  worry 
about  them.  Sometimes,  however,  we  want  to  be  extra  careful  in  guaranteeing  that  power  is  not 
accidentally  steered  in  an  undesired  direction.  For  example,  a  cellular  base  station  employing 
a  beamforming  array  to  receive  a  signal  from  mobile  A  may  wish  to  null  out  interference  from 
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mobile  B.  We  can  use  a  ZF  approach,  analogous  to  the  one  discussed  in  detail  in  Section  8.2.2. 
If  mobile  A  is  in  direction  0a  and  mobile  B  in  direction  0b,  then  we  wish  to  align  c  with  a(0A)  as 
best  we  can,  while  staying  orthogonal  to  a{0B)-  Thus,  we  can  choose  the  beamforming  weights 
to  be  a  scaled  version  of  the  projection  of  a(0^)  orthogonal  to  the  interference  subspace  spanned 
by  a{0B),  which  is  given  by 


ca  =  a{0A)  -  {a{0A),  a(^i?))/  ,  (8.54) 

{a[0B),  a[0B)) 

While  the  ZF  approach  has  the  advantage  of  having  a  clear  geometric  interpretation,  in  practice, 
when  implementing  this  at  the  receiver,  we  may  often  employ  the  MMSE  criterion  (see  Sections 
8.2  and  8.2.2),  which  lends  itself  to  adaptive  implementation. 

We  can  combine  beam  and  null  steering  in  this  fashion  at  the  transmitter  as  well  as  the  receiver. 
There  are  some  additional  issues  when  employing  this  approach  at  the  transmitter.  First,  the 
transmitter  must  know  the  array  responses  corresponding  to  the  different  receivers  it  is  steering 
beams  or  nulls  towards,  which  requires  either  explicit  feedback,  or  implicit  feedback  derived 
from  reciprocity.  Second,  we  must  scale  the  weights  appropriately  depending  on  constraints  on 
transmit  power:  average  power  scales  with  ||c|p,  while  peak  power  scales  with  maxi|cjp. 

Space  division  multiple  access  (SDMA ):  Beamforming  and  nullforming  can  enable  a  single  receiver 
to  receive  from  multiple  transmitters,  and  conversely,  a  single  transmitter  to  transmit  separate 
messages  to  different  receivers,  using  a  common  set  of  time-frequency  resources.  This  is  termed 
space  division  multiple  access  (SDMA).  For  example,  in  order  to  send  a  message  signal  SA{t) 
to  mobile  A  without  interfering  with  mobile  B,  and  message  signal  SB{t)  to  mobile  B  without 
interfering  with  mobile  A,  the  transmitter  sends  the  “space-time”  signal 

y(t)  =  SA{t)c\  +  SB{t)c*B  (8.55) 

where  ca  is  the  zero-forcing  solution  in  (8.54),  and  is  a  zero-forcing  solution  with  the  roles 
of  A  and  B  interchanged.  That  is,  the  signal  transmitted  from  the  ith  antenna  is  a  linear 
combination  of  the  two  message  signals,  yi{t)  =  SA(t)c*j^  +  SB{t)c*^,  where  the  conjugation  of 
the  beamforming  weights  is  in  accordance  with  the  convention  discussed  earlier.  A  receiver  with 
an  antenna  array  can  use  similar  techniques  to  receive  signals  from  multiple  transmitters  at  the 
same  time.  SDMA  is  explored  further  in  Problem  8.11. 


8.4.3  Rich  Scattering  and  MIMO-OFDM 

While  Section  8.4.2  focuses  on  beamsteering  and  nullsteering  along  specihc  directions,  the  channel 
between  transmitter  and  receiver  may  often  be  characterized  by  a  large  number  of  paths,  possibly 
corresponding  to  different  directions  of  arrival  or  departure.  Indoor  WiFi  channels  are  one 
example  of  such  “rich  scattering”  channels.  Figure  8.17  shows  some  of  the  paths  obtained  from 
two-dimensional  ray  tracing  between  a  transmitter  and  a  receiver  in  a  rectangular  room.  These 
include  all  four  hrst-order  reflections  (single  bounces)  and  two  second-order  reflections  (two 
bounces).  Not  all  of  these  have  equal  attenuation  (the  attenuation  of  a  path  depends  on  its 
length,  as  well  as  the  angles  of  incidence  and  the  type  of  material  at  each  surface  it  reflects  off 
of),  but  we  can  see  from  the  construction  of  the  second-order  reflections  that  the  number  of 
paths  quickly  becomes  large  as  we  start  accounting  for  multiple  bounces.  Of  course,  the  path 
strengths  start  dying  out  as  the  number  of  bounces  increases,  since  there  is  a  loss  in  strength  for 
each  bounce,  but  for  typical  indoor  environments  in  the  WiFi  bands  (2.4  and  5  GHz),  there  are 
many  paths  with  nontrivial  gains. 

Exercise:  What  is  the  total  number  of  second-order  reflections  in  the  scenario  depicted  in  Figure 
8.17? 


418 


& 


Figure  8.17:  Ray  tracing  to  determine  paths  between  a  transmitter-receiver  inside  a  “two- 
dimensional  room.”  All  first-order  reflections,  and  two  second-order  reflections,  are  shown.  The 
lightly  shaded  circles  depict  “virtual  sources”  employed  to  perform  ray  tracing. 
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Figure  8.18:  A  typical  propagation  environment  between  an  elevated  base  station  and  a  mobile 
in  urban  clutter.  The  mobile  sees  a  rich  scattering  environment  locally,  due  to  reflections  from 
building  and  street  surfaces.  However,  from  the  base  station’s  viewpoint,  the  paths  to  the  mobile 
fall  within  a  narrow  angular  spread. 
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Even  in  outdoor  settings,  such  as  for  cellular  networks,  mobiles  in  an  urban  environment  may 
see  rich  scattering  because  of  bounces  from  buildings  around  them.  An  elevated  base  station, 
however,  may  still  see  a  relatively  sparse  scattering  environment.  Such  a  situation  is  depicted  in 
Figure  8.18.  Since  the  base  station  sees  a  narrow  angular  spread,  it  may  be  able  to  employ  beam¬ 
forming  strategies  effectively  (e.g.,  forming  a  beam  along  the  “mean”  angle  of  arrival/departure). 
However,  the  mobile  transceiver  must  account  for  the  rich  scattering  environment  that  it  sees. 

At  this  point,  the  reader  is  encouraged  to  quickly  review  Section  2.9.  As  we  noted  there,  a 
multipath  channel  has  a  transfer  function  which  is  “frequency-selective”  (i.e.,  it  varies  with 
frequency).  Now  that  we  have  multiple  antennas,  each  antenna  sees  a  frequency-selective  channel, 
so  that  the  net  array  response  is  frequency-selective.  However,  we  can  model  the  array  response  as 
constant  for  a  small  enough  frequency  slice  (smaller  than  the  coherence  bandwidth-see  discussion 
in  Section  2.9).  OFDM  (see  Section  8.3)  naturally  decomposes  the  channel  into  such  slices,  and 
each  subcarrier  in  a  MIMO-OFDM  system  may  see  a  different  array  response.  Thus,  we  can  apply 
MIMO  processing  in  parallel  to  each  subcarrier  after  downconversion  and  OFDM  processing,  as 
shown  in  Figure  8.19. 


Antenna  1 
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Common  LO 


for  downconversion 


Figure  8.19:  Typical  MIMO-OFDM  receiver  architecture.  After  downconverting  and  sampling 
the  received  signal  from  each  antenna,  we  apply  OFDM  processing  to  separate  out  the  subcarriers. 
After  the  FFT,  the  samples  for  a  given  subcarrier,  say  k,  from  the  different  antennas  are  collected 
together  for  per-sub carrier  MIMO  processing.  Thus,  each  subcarrier  sees  a  different  narrowband 
MIMO  channel. 


Focusing  on  a  single  subcarrier  in  a  MIMO-OFDM  system  (this  model  also  applies  to  narrowband 
signaling  with  bandwidth  smaller  than  the  channel  coherence  bandwidth),  consider  a  link  with 
M  transmit  antennas  and  N  receive  antennas.  Over  a  subcarrier,  the  channel  from  transmit 
element  m  to  receive  element  n  is  a  complex-valued  scalar,  which  we  denote  by  Hnm-  If  fhe 
transmitter  sends  a  complex  symbol  x[m]  from  antenna  m,  then  the  nth  receive  antenna  sees  the 
linear  combination 

M 

Hn  ^  ^  Hnm^m  T  Wji  (8.56) 

m=l 

where  Wn  is  the  complex-valued  noise  seen  at  the  nth  receive  antenna.  The  preceding  can  be 
written  in  matrix-vector  notation  as 

y  =  Hx  -t-  w  (8.57) 

where  y  =  (|/i, ...,  i/at)^  is  the  received  vector,  x  =  (xi,  is  the  transmitted  vector,  and  H 

is  the  N  X  M  channel  matrix,  whose  mth  column  is  the  receive  array  response  seen  by  the  mth 
transmit  element. 

Noise  model:  The  complex- valued  noise  Wn  is  typically  modeled  as  follows:  Re{wn),  Im(ta„)  are 
i.i.d.  A^(0,(T^),  and  are  independent  across  receive  antennas.  The  noise  vector  w  is  said  to  be 
a  complex  Gaussian  random  vector  which  is  completely  characterized  by  its  mean  E[w]  =  0 
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and  covariance  matrix  Cw  =  E  [(w  —  E[w])(w  —  E[w])^]  =  2cr^I.  Its  distribution  is  denoted  by 
w  ~  CN{0,  2cr^I),  and  the  distribution  of  any  entry  is  specified  as  Wn  ~  CN{0,  2(T^). 

Remark  on  notation:  According  to  our  convention  (which  is  consistent  with  most  literature  in 
the  held),  for  an  M  x  N  MIMO  system  (i.e.,  with  M  transmit  antennas  and  N  receive  antennas), 
the  channel  matrix  H  is  an  x  M  matrix.  The  reason  for  this  choice  of  convention  is  that  we 
like  working  with  column  vectors:  x  is  the  M  x  1  column  vector  of  symbols  transmitted  from  the 
different  transmit  antennas,  the  mth  column  of  H  is  the  receiver’s  spatial  response  to  the  mth 
transmit  antenna,  and  y  is  the  x  1  column  vector  of  received  samples. 

Operations  such  as  beamforming  and  nullforming  can  now  be  performed  separately  for  each 
subcarrier.  However,  these  operations  are  no  longer  associated  with  directing  energy  or  nulls 
towards  particular  physical  directions,  since  the  spatial  response  in  each  subchannel  is  a  linear 
combination  of  array  responses  associated  with  many  directions.  A  particularly  simple  model  for 
the  resulting  channel  gains  for  a  given  subcarrier  is  described  next. 

Rich  scattering  model:  The  path  gains  H{n,  m)  for  a  given  subcarrier  are  a  function  of  the 
channel  impulse  responses  between  each  transmit /receive  pair,  but  are  often  modeled  statistically 
in  order  to  provide  quick  insights  into  design  tradeoffs  in  a  manner  that  is  independent  of  the 
specific  propagation  geometry.  We  now  discuss  a  particularly  simple  model,  motivated  by  “rich 
scattering  environments”  in  which  there  are  a  large  number  of  paths  of  roughly  equal  strength 
between  the  transmitter  and  receiver.  Let  h  =  H (m,  n)  denote  the  complex  gain  between  a 
typical  transmit /receive  antenna  pair.  We  can  write 

i=l 

where  L  is  the  number  of  paths,  and  where  Ai  >  0,  6i&  [0,  27r]  are  the  amplitude  and  phase  of 
the  complex-valued  path  gain  for  the  given  subcarrier.  We  therefore  have 

L  L 

Re(h)  =  ''^^AiCosOi  ,  Im(/i)  =  Aj  sin  Oj 

i=l  i=l 

If  the  differences  between  the  lengths  of  the  different  paths  are  comparable  to,  or  larger  than, 
a  carrier  wavelength  (which  is  typically  the  case  even  for  WiFi  links  indoors,  and  certainly  for 
cellular  links  outdoors),  then  we  can  model  the  phases  9i  as  i.i.d.  uniform  over  [0,27r].  Now,  if 
the  amplitudes  for  the  different  paths  are  roughly  comparable,  then  we  can  apply  the  central 
limit  theorem  to  approximate  the  joint  distribution  of  Re(h.)  and  Im(/i)  as  i.i.d.  A^(0,  ^?/2)- 

Let  us  now  normalize  Xlili  ^  without  loss  of  generality;  we  can  scale  the  noise  variance 

to  adjust  the  average  SNR:  SNR  =  ^  for  the  model.  We  can  therefore  model  h  as  a 

zero  mean  complex  Gaussian  random  variable:  h  ~  GA^(0, 1).  Furthermore,  for  rich  scattering 
environments,  it  is  assumed  that  the  phases  seen  by  different  transmit /receive  antenna  pairs 
are  sufficiently  different  that  we  can  model  the  gains  H{n,m)  as  i.i.d.  GA^(0,1)  for  different 
transmit /receive  antenna  pairs  {m,n). 


8.4.4  Diversity 

When  the  transmitter  and  receiver  each  have  only  one  antenna  (M  =  A^  =  1),  under  the  rich 
scattering  model,  the  SNR  is  given  by 


SNR 


\hl 

2a2 


(8.58) 
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Since  h  is  a.  random  variable,  so  is  the  SNR.  In  fact,  since  Re(h)  and  Im(h)  are  i.i.d.  iV(0,  |), 
the  sum  of  their  squares  is  an  exponential  random  variable  (see  Problems  5.11  and  5.21).  Taking 
into  account  the  scaling  by  2a^,  we  can  show  that  SNR  is  an  exponential  random  variable  with 
mean  equal  to  the  average  SNR  SNR  =  If  we  now  design  our  coded  modulation  strategy 
for  a  nominal  SNR  of  SNRq,  we  say  that  the  system  is  in  outage  when  the  SNR  is  smaller  than 
this  value.  The  probability  of  outage  is  given  by 

Pout  =  P[SNR  <  SNRo]  =  1  -  (8.59) 

We  would  typically  choose  the  nominal  SNR,  SNRq,  to  be  smaller  than  the  average  SNR,  SNR, 
by  a  link  margin.  For  example,  for  a  link  margin  of  10  dB,  we  have  SNRq  =  O.lS'iVR,  so  that 
Pout  =  1  — ~  0.1  (for  |x|  small,  e*  ~  1+x  for  |a:|  small,  so  that  1  — ~  x).  Thus,  even  after 
giving  up  10  dB  in  link  margin,  we  still  get  a  relatively  high  outage  rate  of  10%.  Of  course,  there 
is  a  nontrivial  probability  that  the  SNR  with  fading  is  higher  than  the  nominal,  hence  we  can 
have  negative  link  margins  if  we  are  willing  to  live  with  large  enough  outage  rates.  For  example, 
a  link  margin  of  -  3  dB  corresp  that  SNRq  =  2SNR,  with  outage  rate  Pout  =  1  —  =  0.865 

(too  high  for  most  practical  applications). 

In  order  to  reduce  the  outage  rate  without  increasing  the  link  margin,  we  must  employ  diversity, 
which  is  a  generic  term  used  for  any  strategy  that  gets  multiple,  approximately  independent, 
“looks”  at  a  fading  channel.  We  saw  diversity  in  action  for  our  simulation-based  model  in 
Software  Lab  2.2.  We  now  explore  it  for  the  rich  scattering  model,  skipping  some  details  in  the 
derivation  in  the  interest  of  arriving  quickly  at  the  key  insights. 

Benchmark:  We  continue  to  dehne  our  link  margin  relative  to  an  unfaded  single  input  single 
output  (SISO)  system  with  average  SNR  of  SNR  = 

Receive  diversity:  Consider  a  receiver  equipped  with  two  antennas  {N  =  2).  If  they  are  spaced 
far  enough  apart  in  a  rich  scattering  environment,  we  can  assume  that  the  channel  gains  (for  a 
given  subcarrier)  seen  by  the  two  antennas  are  i.i.d.  C'A^(0,1)  random  variables.  The  received 
samples  at  the  two  antennas  are  modeled  as 


Vn  =  hnX  +  Wn  ,  U  =  1,2 

where  x  is  the  transmitted  symbol,  h[l],  h[2]  ~  CN{0, 1)  are  i.i.d.  (independent  Rayleigh  fading), 
and  tc[l],  t(;[2]  ~  CN{0,  2cr^)  are  i.i.d.  (independent  noise  samples).  The  optimal  decision  statistic 
is  obtained  using  receive  beamforming,  and  is  given  by 

^  =  hlyi  +  hly2  (8.60) 

It  can  be  shown  that  the  SNR  is  now  given  by 

SNR  =  G  'SNR  (8.61) 


where  the  gain  relative  to  the  benchmark  SISO  system  is  given  by 

G=|hi|2+|h2p  (8.62) 

We  can  break  this  up  into  two  gains:  G diversity  =  due  to  averaging  channel  fluctuations 

across  antennas,  and  Gcoh  =  2  due  to  averaging  noise  across  antennas  (the  signal  terms  are 
being  combined  coherently,  so  that  the  phases  line  up,  while  the  noise  terms  are  being  combined 
incoherently,  across  the  two  receive  antennas).  Thus, 


G 


G. 


diversity  P^coh 


where  G diversity  =  and  Gcoh  =  2. 


(8.63) 
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Equations  (8.61)  and  (8.63)  generalize  directly  to  N  receive  antennas: 


G 

G 


—  |/ilP  +  ...  +  |/iArP 

_  |feip+...+|fejv| 

diversity  j\j- 


2 


G  diversity  G  coh 

G,oh  =  N 


(8.64) 


As  N  gets  large,  the  fluctuations  due  to  fading  get  smoothed  away,  and  Gdiversity  ^  1  by  the  law 
of  large  numbers.  In  practice,  however,  even  small  values  of  N  (e.g.,  N  =  2,4)  give  significant 
performance  gains. 


Figure  8.20:  Probability  of  outage  versus  link  margin  (dB)  for  receive  diversity  in  1  x  iV  MIMO 
systems. 


Suppose  now  that  we  design  our  coding  and  modulation  so  as  to  provide  reliable  performance  at 
a  nominal  SNR,  say  SNRq,  which  is  smaller  than  the  SISO  benchmark  SNR  by  a  link  margin 
of  L  dB:  SNRo{dB)  =  SNR{dB)  —  L{dB).  The  probability  of  outage  is  given  by 

=  P[SNR  <  SNRq]  =  P[G  <  io-GdB)/w^  (g_Q5) 

Figure  8.20  plots  the  outage  probability  as  a  function  of  link  margin  for  several  different  values 
of  N.  The  plots  are  obtained  using  the  procedure  outlined  in  Problem  8.12. 

Transmit  diversity:  If  the  transmitter  has  multiple  antennas,  it  can  beamform  towards  the 
receiver  if  it  has  implicit  or  explicit  feedback  regarding  the  channel,  as  already  noted.  When 
such  feedback  is  not  available,  we  would  like  to  use  open  loop  strategies  which  provide  diversity. 
Consider  a  system  with  a  transmitter  with  two  antennas,  and  a  receiver  with  a  single  antenna 
(M  =  2,N  =  1).  In  a  MIMO-OFDM  system,  for  a  given  subcarrier,  suppose  that  the  transmit 
antenna  1  sends  the  sample  xi  and  transmit  2  sends  the  sample  X2-  If  the  transmitter  knows 
the  channel  coefficients  hi  and  h2  from  the  two  transmit  antennas  to  the  receive  antenna,  then 
it  could  choose  Xi  =  h\x,  X2  =  h^x,  where  x  is  the  symbol  to  be  transmitted.  What  do  we  do 
when  hi  and  ^2  are  unknown?  In  general,  if  we  send  Xi,  X2  from  the  two  transmit  elements,  then 
the  received  sample  is  given  by 

y  =  hiXi  +  h2X2  +  w  (8.66) 

where  w  is  noise.  Thus,  if  xi  and  X2  are  two  independent  symbols,  then  they  interfere  with  each 
other  at  the  receiver.  On  the  other  hand,  if  we  set  xi  =  X2  =  x/\/2  (normalizing  the  transmit 
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power  across  the  two  antennas  to  that  of  a  transmitter  with  a  single  antenna),  then  we  receive 

hi  +  h2 

y  = - ■= — X  +  w 

If  hi,  h2  are  i.i.d.  CN{0, 1),  it  is  easy  to  show  that  the  effective  channel  coefficient  he//  = 

is  also  1).  Thus,  we  still  have  Rayleigh  fading,  and  have  not  made  any  progress  relative  to 

a  single  antenna  transmitter!  An  ingenious  solution  to  this  problem  is  the  Alamouti  space-time 
code  (named  after  its  inventor),  which  resolves  the  interference  between  the  signals  sent  by  the 
two  transmit  antennas  over  two  time  samples.  Let  s[l]  and  s[2]  be  two  symbols  to  be  transmitted. 
For  a  single  antenna  transmitter,  they  would  be  transmitted  in  sequence.  For  the  two  antenna 
transmitter  now  being  considered,  expanding  the  signal  space  dimension  to  two  at  the  receiver 
by  considering  two  successive  time  samples  allows  us  to  orthogonalize  the  contributions  of  these 
two  symbols  at  the  receiver.  Denoting  by  Xi[l]  and  Xj[2],  i  =  1,2  the  samples  transmitted  from 
antenna  i  at  two  successive  time  intervals,  we  set 


Xi[l]  =  h[l]/V2,  X2[l]  =  h[2]/V2 

a;i[2]  =  -6*[2]/x/2,  X2[2]  =  6*[l]/\/2 


(8.67) 


where  we  have  again  normalized  the  net  transmit  power  to  that  of  a  single  antenna  system. 
Figure  8.21  depicts  the  operation  of  the  Alamouti  space-time  code,  taking  a  sequence  of  symbols 
as  input,  and  mapping  them  in  groups  of  two  to  a  sequence  of  samples  at  the  output  of  each 
antenna. 


b[l],b[2],b[3],b[4],... 


Alamouti 

space-time 

code 


b[l]-b*[2],b[3]-b*[4],... 


b[2],  b*[l],b[4],  b*[3],... 


Figure  8.21:  The  transmitter  in  an  Alamouti  space-time  code  takes  two  symbols  at  a  time,  and 
maps  them  to  two  consecutive  symbols  to  be  sent  from  each  of  the  two  transmit  antennas.  The 
input  to  the  space-time  encoder  is  the  sequence  of  symbols  to  be  transmitted,  {^[n.]},  while  the 
outputs  are  the  sequences  {xi[n]},  i  =  1,2,  to  be  transmitted  from  antenna  i.  The  l/\/2  factor 
for  power  normalization  is  omitted  from  the  hgure. 


The  received  samples  in  the  two  successive  time  intervals  are  given  by 

y[l\  =  hixi[l]  +  h2X2[l]  +  w[l]  =  ^b[l]  +  ^b[2]  +  m;[1] 

y[2]  =  hiXi[2]  +  h2X2[2]+w[2]  =  -^b*[2]  +  ^b*[l]+w[2]  ^  ^ 

We  assume  that  the  receiver  has  estimates  of  the  channel  coefficients  hi  and  ^2  (e.g.,  using  known 
training  signals).  We  would  like  to  write  the  two  observations  as  a  received  vector  in  which  each 
symbol  modulates  a  different  signal  vector.  Since  the  symbols  are  conjugated  when  sent  over  the 
second  time  interval,  we  conjugate  the  second  received  sample  when  creating  the  received  vector. 
This  yields  the  following  vector  model; 

=  (  S-Pl  )  "  (  I  )  ■"  (  -1  )  ^  (  ’"■PI  )  ^ 

The  vectors  ui  =  -^{hi,  h^)'^  and  U2  =  -^{h2,  —h*i)^  are  orthogonal  (i.e.,  u(^U2  =  0),  regardless 
of  the  values  of  the  channel  coefficients,  hence  the  symbols  6[1]  and  6[2]  do  not  interfere  with 
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each  other.  The  vector  w  ~  CN{{),2(j'^T).  The  optimal  decision  statistic  Z[i]  for  symbol  h[i\  is 
given  by  matched  hltering  against  up 


=  ufy,  z  =  l,2  (8.70) 

Exercise:  Write  out  these  decision  statistics  explicitly  in  terms  of  |/[1],  |/[2],  hi,  ^2. 

Answer:  Z[l]  =  hly[l]  +  h2y*[2],  Z[2]  =  —  hiy*\2]  (up  to  scale). 

The  SNR  seen  by  each  symbol  is  given  by 


SNR 


Alamouti 


(|hip  +  |h2|2)/2 

2a2 


G 


Alamouti 


SNR 


(8.71) 


where 


G  Alamouti 


(|hip  +  |h2p 

2 


(8.72) 


Comparing  with  (8.64)  for  receive  diversity,  we  see  that  the  Alamouti  scheme  in  a  2  x  1  system 
achieves  the  same  diversity  gain  as  receive  diversity  in  a  1  x  2  system,  but  does  not  provide  the 
coherent  gain  obtained  from  averaging  across  receive  antennas  in  the  latter.  Of  course,  as  we  see 
in  Problem  8.13  and  in  Software  Lab  8.3,  the  Alamouti  scheme  applies  to  2  x  iV  MIMO  systems 
for  arbitrary  N,  so  that  we  can  get  noise  averaging  and  receive  diversity  gains  for  iV  >  1.  The 
outage  rates  computed  in  Problem  8.13  are  plotted  in  Figure  8.22. 


Figure  8.22:  Probability  of  outage  versus  link  margin  (dB)  for  Alamouti  space-time  coding  for 
2  X  N  MIMO  with  rich  scattering. 


The  simplicity  of  the  Alamouti  construction  (and  its  optimality  for  2x1  MIMO)  has  led  to  its 
adoption  by  a  number  of  cellular  and  WiFi  standards  (just  do  an  Internet  search  to  see  this). 
Unfortunately,  the  orthogonalization  provided  by  the  Alamouti  space-time  code  does  not  scale  to 
more  than  two  transmit  antennas.  There  are  a  number  of  “quasi-orthogonal”  constructions  that 
have  been  investigated,  but  these  have  had  less  impact  on  practice.  Indeed,  when  there  are  a 
large  number  of  transmit  antennas,  the  trend  is  to  engineer  the  system  so  that  the  transmitter  has 
enough  information  about  the  channel  to  perform  some  form  of  transmit  beamforming  (possibly 
using  multiple  beams). 
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8.4.5  Spatial  multiplexing 


We  have  already  seen  that  a  transceiver  with  multiple  antennas  can  use  SDMA  to  communicate 
with  multiple  nodes  at  different  locations.  For  example,  a  cellular  base  station  with  multiple 
antennas  can  use  the  same  time- frequency  resources  to  send  data  streams  in  parallel  to  different 
mobile  devices  (even  if  such  devices  only  have  one  antenna  each).  When  both  transmitter  and 
receiver  have  multiple  antennas,  if  the  propagation  environment  is  “rich  enough,”  then  multiple 
parallel  data  streams  can  be  sent  between  transmitter  and  receiver.  This  is  termed  spatial 
multiplexing.  Figure  8.23  depicts  spatial  multiplexing  with  two  antennas,  modeling,  for  example, 
one  subcarrier  in  a  MIMO-OFDM  system.  The  per-stream  symbol  rate  1/T  is  the  rate  of  sending 
symbols  along  a  subcarrier,  where  T  is  the  length  of  an  OFDM  symbol.  With  M-fold  spatial 
multiplexing,  the  aggregate  symbol  rate  for  a  given  subcarrier  is  M/T.  This  should  be  scaled  up 
by  the  number  of  subcarriers  to  get  the  overall  symbol  rate. 


b[l],b[2],b[3],b[4],... 


aggregate  symbol  rate  2Ar 


b[l],b[3],... 

b[2],b[4],... 


per-stream  symbol  rate  1/T 


Figure  8.23:  For  spatial  multiplexing,  the  transmitter  may  take  a  sequence  of  incoming  symbols 
{6[n]},  and  do  a  serial-to-parallel  conversion  to  map  them  to  subsequences  to  be  transmitted  from 
the  different  antennas.  In  the  example  shown,  the  odd  symbols  are  transmitted  from  antenna  1, 
and  the  even  symbols  from  antenna  2.  The  aggregate  symbol  rate  is  twice  the  per-stream  symbol 
rate. 


For  example,  suppose  that  the  transmitter  and  receiver  in  a  MIMO-OFDM  system  each  have 
two  antennas  {M  =  N  =  2).  For  a  given  subcarrier  in  an  OFDM  system,  consider  a  particular 
time  interval.  Suppose  that  the  transmitter  sends  xi  from  antenna  1  and  X2  from  antenna  2 
(referring  to  Figure  8.23,  xi  =  6[1]  and  X2  =  b[2]  in  the  hrst  time  interval).  The  samples  at  the 
two  receive  elements  are  given  by 


yi  =  Hiixi  -b  H12X2  +  wi 
y2  =  H21X1  -b  H22X2  +  W2 


which  we  can  write  in  vector  form  as 


y  = 


=  Xi 


Hii 

H21 


xiUi -b  X2U2 -b  W  (8.73) 


where  ui  is  the  response  seen  by  transmit  element  1  at  the  two  receive  antennas,  U2  is  the 
response  seen  by  transmit  element  2  at  the  receive  antennas,  and  w  ~  C'iV(0,2cr^)  is  complex 
WON.  While  we  have  considered  a  2  x  2  MIMO  system  for  illustration,  the  model  is  generally 
applicable  for  2  x  N  MIMO  systems  with  N  >  2,  with  ui  and  U2  denoting  the  two  columns 
of  the  channel  matrix  H,  corresponding  to  the  received  responses  for  each  of  the  two  transmit 
antennas,  respectively. 

In  a  MIMO-OFDM  system,  we  have  eliminated  interference  across  subcarriers  using  OFDM,  but 
we  have  introduced  interference  in  space  by  sending  multiple  symbols  from  different  antennas. 
The  vector  spatial  interference  model  (8.73)  for  each  subcarrier  is  analogous  to  the  vector  time 
domain  interference  model  for  ISI  in  singlecarrier  systems  discussed  in  Chapter  6.  Just  as  we  can 
compensate  for  ISI  using  a  time-domain  equalizer  if  the  time  domain  channel  has  appropriate 
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characteristics,  we  can  compensate  for  spatial  interference  using  a  spatial  equalizer  if  the  spatial 
channel  has  appropriate  characteristics.  For  example,  if  ui  and  U2  are  linearly  independent, 
then  we  can  use  linear  ZF  or  MMSE  techniques  to  demodulate  the  symbols  Xi  and  X2-  Thus, 
if  there  are  M  parallel  data  streams  being  sent  from  the  transmit  antennas,  then  we  need  at 
least  M  receive  antennas  in  order  to  obtain  a  signal  space  of  large  enough  dimension  for  the 
linear  independence  condition  to  be  satisfied.  Indeed,  it  can  be  shown  more  generally  (without 
restricting  ourselves  to  ZF  or  MMSE  techniques)  that,  for  rich  scattering  models,  the  capacity 
of  an  M  X  MIMO  channel  scales  as  min(M,  N),  the  minimum  of  the  number  of  transmit  and 
receive  antennas. 

The  ZF  and  MMSE  receivers  have  been  discussed  in  detail  Section  8.2.2.  For  our  purpose,  the 
relevant  expressions  are  those  for  complex- valued  signals  in  (8.35).  We  reproduce  the  expression 
for  ZF  correlator  here  before  adapting  it  to  our  present  purpose. 

CZF  =  C;,^U  (U^C-'U)"'e 

Recall  that  U  is  a  matrix  containing  the  signal  vectors  as  columns,  and  that  e  is  a  unit  vector 
with  nonzero  entry  corresponding  to  the  desired  vector  uq  in  the  ISI  vector  model  (8.13).  In 
our  spatial  multiplexing  model  (8.73),  U  =  H  (the  signal  vectors  are  simply  the  columns  of 
the  channel  matrix),  the  noise  covariance  Cw  =  2cr^I,  and  we  wish  to  demodulate  the  data 
corresponding  to  both  of  the  signal  vectors.  Letting  ei  =  (1,  0)^  and  e2  =  (0, 1)^,  the  ZF 
correlators  for  the  two  streams  can  be  written  as  (dropping  scale  factors  corresponding  to  the 
noise  variance) 

Cl  =  H(H^H)'^ei,  C2  =  H(H^H)~^e2 

We  can  represent  this  compactly  as  a  single  ZF  matrix  Czf  =  [C1C2]  containing  these  correlators 
are  columns.  Noting  that  [0162]  =  I,  we  obtain  that 

Czf  =  H  (H^H)  ^ ,  ZF  matrix  for  spatial  demultiplexing  (8.74) 

The  decision  statistics  for  the  multiplexed  streams  are  given  by 

Z  =  Cf^y  (8.75) 

While  we  have  focused  on  the  2x2  example  (8.73)  in  this  derivation,  it  applies  in  general  to  M 
spatially  multiplexed  streams  in  an  M  x  iV  MIMO  system  with  N  >  M.  The  outage  rate  with 
zero-forcing  reception  for  2x2  and  2  x  4  is  plotted  in  Figure  8.24,  using  software  developed  in 
Problem  8.14. 

The  MMSE  receiver  can  be  similarly  derived  to  be 

Cmmse  =  (HH^  -|-  2(T^I)  ^  H,  MMSE  matrix  for  spatial  demultiplexing  (8.76) 

where  we  have  normalized  the  transmitted  symbols  to  unit  energy  (E  [|5[n]p]  =  =  1). 

It  is  interesting  to  compare  the  spatial  multiplexing  model  (8.73)  with  the  diversity  model  (8.69) 
for  the  Alamouti  space-time  code.  The  Alamouti  code  does  not  rely  on  the  receiver  having 
multiple  antennas,  and  therefore  uses  time  to  create  enough  dimensions  for  two  symbols  to  be 
sent  in  parallel.  Furthermore,  the  vectors  ui  and  U2  in  the  Alamouti  model  (8.69)  are  constructed 
such  that  they  are  orthogonal,  regardless  of  the  propagation  channel.  In  contrast,  the  spatial 
multiplexing  model  (8.73)  relies  on  nature  to  provide  vectors  Ui  and  U2  that  are  “different 
enough”  to  support  two  parallel  symbols.  We  explore  these  differences  in  Software  Lab  8.3.  The 
spectral  efficiency  of  spatial  multiplexing  is  twice  that  of  the  Alamouti  code,  but  the  diversity 
gain  that  it  sees  is  smaller,  as  is  evident  from  a  comparison  of  the  outage  rate  versus  link  margin 
curves  in  Figures  8.22  and  8.24.  It  is  possible  to  systematically  quantify  the  tradeoff  between 
diversity  and  multiplexing,  but  this  is  beyond  our  scope  here. 
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Figure  8.24:  Outage  rate  versus  link  margin  (with  respect  to  the  SISO  benchmark)  for  2  x  N 
spatial  multiplexing  (TV  =  2, 4)  with  zero-forcing  reception. 


8.5  Concept  Inventory 


This  chapter  provides  an  introduction  to  modeling  and  equalization  for  communication  over 
dispersive  channels,  including  singlecarrier  and  OFDM  modulation.  All  models  and  algorithms 
are  developed  in  complex  baseband,  so  that  upconversion  and  downconversion  are  not  explicitly 
modeled. 

Modeling  of  singlecarrier  systems 

•  Symbols  in  a  linearly  modulated  system  pass  through  a  cascade  of  the  transmit,  channel  and 
receive  hlters,  where  the  cascade  typically  does  not  satisfy  the  Nyquist  criterion  for  ISI  avoidance. 
Any  technique  for  handling  the  resulting  ISI  is  termed  equalization. 

•  Receiver  noise  is  modeled  as  AWGN  passed  through  the  receive  filter. 

•  Eye  diagrams  enable  visualization  of  the  effect  of  various  ISI  patterns,  and  equalization  tech¬ 
niques  are  needed  for  reliable  demodulation  if  the  eye  is  closed. 

Linear  equalization 

•  The  decision  statistic  for  a  given  symbol  is  computed  by  a  linear  operation  on  a  vector  of 
received  samples  in  an  observation  interval  that  is  typically  chosen  to  be  large  enough  that  it 
contains  a  signihcant  contribution  from  the  symbol  of  interest.  Observation  intervals  for  succes¬ 
sive  symbols  are  offset  by  the  symbol  time,  so  that  the  statistics  of  the  desired  symbol  and  the 
ISI  are  identical  across  observation  intervals. 

•  The  linear  MMSE  equalizer  minimizes  the  MSE  between  the  decision  statistic  and  the  desired 
symbol,  and  also  maximizes  the  SINK  over  the  class  of  linear  equalizers. 

•  The  MSE  and  the  MMSE  equalizer  can  be  expressed  in  terms  of  statistical  averages,  hence 
the  MMSE  equalizer  can  be  computed  adaptively  by  replacing  statistical  averages  by  empirical 
averages.  Such  adaptive  implementation  requires  transmission  of  a  known  training  sequence. 

•  The  received  vector  over  an  observation  interval  is  the  snm  of  the  desired  symbol  modnlating 
a  desired  vector,  interfering  symbols  modulating  interference  vectors,  pins  a  noise  vector.  This 
vector  ISI  model  can  be  characterized  completely  if  we  know  the  transmit  hlter,  the  channel 
hlter,  the  receive  filter,  and  the  noise  statistics  at  the  input  to  the  receive  filter. 
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•  Explicit  analytical  formulas  can  be  given  for  the  ZF  and  MMSE  equalizer,  and  the  associated 
SINRs,  once  the  vector  ISI  model  is  specihed. 

•  At  high  SNR,  the  MMSE  equalizer  tends  to  the  ZF  equalizer,  which  (for  white  noise)  can  be  in¬ 
terpreted  geometrically  as  projecting  the  received  vector  orthogonal  to  the  interference  subspace 
spanned  by  the  interference  vectors,  thus  nulling  out  the  ISI  while  incurring  noise  enhancement. 
The  ZF  equalizer  exists  only  if  the  desired  vector  is  linearly  independent  of  the  interference  vec¬ 
tors. 

•  The  geometric  interpretation  and  analytical  formulas  for  the  ZF  and  MMSE  equalizers  devel¬ 
oped  for  white  noise  can  be  extended  to  colored  noise,  with  the  derivation  using  the  concept  of 
noise  whitening. 

OFDM 

•  Since  complex  exponentials  are  eigenfunctions  of  any  LTI  system,  multiple  complex  expo¬ 
nentials,  each  modulated  by  a  complex-valued  symbol,  do  not  interfere  with  each  other  when 
transmitted  through  a  dispersive  channel.  Each  complex  exponential  simply  gets  scaled  by  the 
channel  transfer  function  at  that  frequency.  The  task  of  equalization  corresponds  to  undoing  this 
complex  gain  in  parallel  for  each  complex  exponential.  This  is  the  conceptual  basis  for  OFDM, 
which  enables  parallelization  of  the  task  of  equalization  even  for  very  complicated  channels  by 
transmitting  along  subcarriers. 

•  OFDM  can  be  implemented  efficiently  in  DSP  using  an  IDFT  at  the  transmitter  (frequency 
domain  symbols  to  time  domain  samples)  and  a  DFT  at  the  receiver  (time  domain  samples  to 
frequency  domain  observations,  which  are  the  symbols  scaled  by  channel  gain  and  corrupted  by 
noise). 

•  In  order  to  maintain  orthogonality  across  subcarriers  (required  for  parallelization  of  the  task 
of  equalization)  when  we  take  the  DFT  at  the  receiver,  the  effect  of  the  channel  on  the  trans¬ 
mitted  samples  must  be  that  of  a  circular  convolution.  Since  the  physical  channel  corresponds 
to  linear  convolution,  OFDM  systems  emulate  circular  convolution  by  inserting  a  cyclic  prehx  in 
the  transmitted  time  domain  samples. 

This  section  provides  an  initial  exposure  to  how  multiple  antennas  at  the  transmitter  and  receiver 
can  be  employed  to  enhance  the  performance  of  wireless  systems.  Three  key  techniques,  which 
in  practice  are  combined  in  various  ways,  are  beamforming,  diversity  and  spatial  multiplexing. 

Beamforming  and  Nullforming 

•  The  array  response  for  a  linear  array  can  be  viewed  as  a  mapping  from  the  angle  of  ar¬ 
rival/departure  to  a  spatial  frequency. 

•  For  an  iV-element  array,  spatial  matched  hlter,  or  beamforming,  leads  to  a  factor  of  N  gain  in 
SNR.  For  transmit  beamforming  in  which  each  antenna  element  is  transmitting  at  a  hxed  power, 
we  obtain  an  additional  power  combining  gain  of  N. 

•  By  forming  a  beam  at  a  desired  transceiver  and  nulls  at  other  transceivers,  an  antenna  array 
can  support  SDMA. 

MIMO-OFDM  abstraction 

•  Decomposing  a  time  domain  channel  into  subcarriers  using  OFDM  allows  a  simple  model  for 
MIMO  systems,  in  which  the  channel  between  each  pair  of  transmit  and  receive  antennas  is 
modeled  as  a  single  complex  gain  for  each  sub  carrier. 

•  When  the  propagation  environment  is  complex  enough,  the  central  limit  theorem  motivates 
modeling  the  channel  gains  between  transmit-receive  antenna  pairs  as  i.i.d.  zero  mean  complex 
Gaussian  random  variables.  We  term  this  rich  scattering  model. 

•  Under  the  rich  scattering  model,  each  transmit-receive  antenna  pair  sees  Rayleigh  fading,  but 
performance  degradation  due  to  fading  can  be  alleviated  using  diversity. 

Diversity 

•  Diversity  strategies  average  over  fades  by  exploiting  roughly  independent  looks  at  the  channel. 

•  Receive  spatial  diversity  using  spatial  matched  hltering  provides  a  channel  averaging  gain  (av¬ 
eraging  the  fading  gains  across  antennas)  and  a  noise  averaging  gain  (due  to  coherent  combining 
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across  antennas). 

•  Transmit  spatial  diversity  provides  channel  averaging  gains  alone.  It  is  trickier  than  receive  di¬ 
versity,  since  samples  transmitted  from  different  transmit  antennas  can  interfere  at  the  receiver. 
For  two  transmit  antennas,  the  Alamouti  space-time  code  is  an  optimal  scheme  for  avoiding 
interference  between  different  transmitted  symbols,  while  providing  channel  averaging  gains. 

Spatial  multiplexing 

•  Sending  parallel  data  streams  from  different  antennas  increases  the  symbol  rate  proportional 
to  the  nnmber  of  data  streams,  with  space  playing  a  role  analogons  to  bandwidth. 

•  The  parallel  data  streams  interfere  at  the  receiver,  bnt  can  be  demodnlated  nsing  spatial 
equalization  techniques  analogous  to  the  time  domain  equalization  techniques  studied  in  Chapter 
6  (e.g.,  suboptimal  techniques  such  as  ZF  and  MMSE). 


8.6  Endnotes 


While  we  have  shown  that  ISI  in  singlecarrier  systems  can  be  handled  using  linear  equalization, 
signihcant  performance  improvements  can  be  obtained  using  nonlinear  strategies,  including  op¬ 
timal  maximum  likelihood  sequence  estimation  (MLSE),  whose  complexity  is  often  prohibitive 
for  long  channels  and/or  large  constellations,  as  well  as  suboptimal  strategies  such  as  decision 
feedback  equalization  (DFE),  whose  complexity  is  comparable  to  that  of  linear  equalization.  An 
introduction  to  such  strategies,  as  well  as  pointers  for  further  reading,  can  be  found  in  more 
advanced  communication  theory  texts  such  as  [7,  8] . 

OFDM  has  now  become  ubiquitous  in  both  wireless  and  wireline  communication  systems  in  recent 
years,  because  it  provides  a  standardized  mechanism  for  parallelizing  equalization  of  arbitrarily 
complicated  channels  in  a  way  that  leverages  the  dropping  cost  and  increasing  speed  of  digital 
computation.  For  more  detail  than  we  have  presented  here,  we  refer  to  the  relevant  chapters  in 
books  on  wireless  communication  by  Goldsmith  [42]  and  Tse  and  Viswanath  [43].  These  should 
provide  the  background  required  to  access  the  huge  research  literature  on  OFDM,  which  focuses 
on  issues  such  as  channel  estimation,  synchronization  and  reduction  of  PAPR. 

There  has  been  an  explosion  of  research  and  development  activity  in  MIMO,  or  space-time 
communication,  starting  from  the  1990s:  this  is  the  decade  in  which  the  large  capacity  gains 
provided  by  spatial  multiplexing  were  pointed  out  by  Foschini  [44]  and  Telatar  [45],  and  the 
Alamouti  space-time  code  was  published  by  Alamouti  [46].  MIMO  techniques  have  been  in¬ 
corporated  into  3G  and  4G  (WiMax  and  LTE)  cellular  standards,  and  WiFi  (IEEE  802. lln) 
standards.  An  excellent  reference  for  exploring  MIMO-OFDM  further  is  the  textbook  by  Tse 
and  Viswanath  [43],  while  a  brief  introduction  is  provided  in  Chapter  8  of  Madhow  [7].  Other 
books  devoted  to  MIMO  include  Paulraj  et  al  [47],  Jafarkhani  [48],  and  the  compilation  edited 
by  Bolcskei  et  al  [49]. 

As  discussed  in  the  epilogue,  a  new  frontier  in  MIMO  is  opening  up  with  research  and  development 
for  wireless  communication  systems  at  higher  carrier  frequencies,  starting  with  the  “millimeter 
wave”  band  (i.e.,  carrier  frequencies  in  the  range  30-300  GHz,  for  which  the  wavelength  is  in  the 
range  1-10  mm).  Of  particular  interest  is  the  60  GHz  band,  where  there  is  a  huge  amount  (7 
GHz!)  of  unlicensed  spectrum,  in  contrast  to  the  crowding  in  existing  cellular  and  WiFi  bands. 
While  fundamental  MIMO  concepts  such  as  beamforming,  diversity,  and  spatial  multiplexing 
still  apply,  the  order  of  magnitude  smaller  carrier  wavelength  and  the  order  of  magnitude  larger 
bandwidth  requires  fundamentally  rethinking  many  aspects  of  link  and  network  design,  as  we 
briefly  indicate  in  the  epilogue. 
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8.7  Problems 


ZF  and  MMSE  equalization:  modeling  and  numerical  computations 

Problem  8.1  (Noise  enhancement  computations)  Consider  the  ISI  vector  models  in  (8.10), 
(8.12)  and  (8.11). 

(a)  Compute  the  noise  enhancements  (dB)  for  the  three  equalizer  lengths  in  these  models,  as¬ 
suming  white  noise. 

(b)  Now,  assume  that  the  noise  w[?7,]  is  colored,  with  Cw  specihed  as  follows: 

r  * = j 

Cw(i,j)  =  <  |»-j|  =  l  (8.77) 

0,  else 

Compute  the  noise  enhancements  for  the  three  equalizer  lengths  considered,  and  compare  with 
your  results  in  (a). 

Problem  8.2  (Noise  enhancement  as  a  function  of  correlator  length)  Now,  consider  the 
discrete  time  channel  model  leading  to  ISI  vector  models  in  (8.10),  (8.12)  and  (8.11). 

(a)  Assuming  white  noise,  compute  and  plot  the  noise  enhancement  (dB)  as  a  function  of  equalizer 
length,  for  L  ranging  from  4  to  16,  increasing  the  observation  interval  by  two  by  adding  one  sample 
to  each  side  of  the  current  observation  interval,  and  starting  from  an  observation  interval  of  length 
L  =  4  lined  up  with  the  impulse  response  for  the  desired  symbol.  Does  the  noise  enhancement 
decrease  monotonically?  Does  it  plateau?  (b)  Repeat  for  colored  noise  as  in  Problem  8.1(b). 

Problem  8.3  (MMSE  correlator  and  SINK  computations)  Consider  the  ISI  vector  model 

(8.10). 

(a)  Assume  Cw  =  and  dehne  SNR  =  as  the  MF  bound  on  achievable  SNR.  For 

SNR  of  6  dB,  compute  the  MMSE  correlator  and  the  corresponding  SINR  (dB),  using  (8.34) 
and  (8.16).  Check  that  the  results  match  the  alternative  formula  (8.36).  Compare  with  the  SNR 
achieved  by  the  ZF  correlator. 

(b)  Plot  the  SINR  (dB)  of  the  MMSE  and  ZF  correlators  as  a  function  of  the  MF  SNR  (dB). 
Comment  on  any  trends  that  you  notice. 


Transmit  Channel 

filter  filter 


Receive 

filter 


Figure  8.25:  Continuous  time  model  for  a  link  with  ISI. 


Problem  8.4  (Prom  continuous  time  to  discrete  time  vector  ISI  model)  In  this  problem, 
we  discuss  an  example  of  how  to  derive  the  vector  ISI  model  (8.13)  from  a  continuous  time  model, 
using  the  system  shown  in  Figure  8.25.  The  symbol  rate  1/T  =  1,  the  input  to  the  transmit 
hlter  is  where  b[n\  G  {  —  1, 1}.  Thus,  the  continuous  time  noiseless  signal  at 
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the  output  of  the  receive  hlter  is  ~  where  q{t)  =  {qtx  *  Qc  *  9Rx)if)  is  the 

system  response  to  a  single  symbol.  The  noise  n{t)  at  the  input  to  the  receive  filter  is  WGN 
with  PSD  cr^  =  so  that  the  noise  w{t)  =  {n*  gjix)it)  at  the  output  of  the  receive  filter,  using 
the  material  in  Section  5.9,  is  zero  mean,  WSS,  Gaussian  with  autocorrelation/autocovariance 
function  R^{t)  =  C^{t)  =  a'^ignx  *  gRx,mf){r). 

(a)  Sketch  the  end-to-end  response  q{t).  Gompute  the  energy  per  bit  Eb  =  ||g|p. 

Remark:  Note  that  ^  If  we  fix  the  signal  scaling,  and  hence  ||q'|p,  then  the  value  of 

is  hxed  once  we  specify  Ei,/Nq. 

(b)  Assume  that  the  sampler  operates  at  rate  2/T  =  2,  taking  samples  at  times  t  =  m/2  for 
integer  m.  Show  that  the  discrete  time  end-to-end  response  to  a  single  symbol  (i.e.  the  sampled 
version  of  q{t))  is 

h=(...,0,l,2,-l  -1,-i  0,...) 

(c)  For  the  given  sampling  rate,  show  that  a  vector  ISI  model  (8.13),  the  noise  covariance  matrix 
satishes  (8.77). 

(d)  Specify  the  matrix  U  corresponding  to  the  ISI  model  (8.13)  for  a  linear  equalizer  of  length 
5,  with  observation  interval  lined  up  with  the  channel  response  for  the  desired  symbol. 

(e)  Taking  into  account  the  noise  coloring,  compute  the  optimal  ZF  correlator,  and  its  noise 
enhancement  relative  to  the  matched  filter  bound. 

(f)  Gompute  the  MMSE  correlator  for  ^  of  10  dB  (see  (a)  and  the  associated  remark).  What 
is  the  output  SINK,  and  how  does  it  compare  with  the  SNR  of  the  ZF  correlator  in  (e)? 

ZF  and  MMSE  equalization:  theoretical  derivations 

Problem  8.5  (ZF  geometry)  For  white  noise,  the  output  of  a  ZF  correlator  satisfying  (8.17) 
and  (8.18)  is  given  by 

c^r[?7,]  =  b[n]  +  N{0,  cr^|  |c|  |^) 

Since  the  signal  scaling  is  hxed,  the  optimal  ZF  correlator  is  one  that  minimizes  the  noise  variance 
(j^||c|p.  Thus,  the  optimal  ZF  correlator  minimizes  ||c|p  subject  to  (8.17)  and  (8.18). 

(a)  Suppose  a  correlator  Ci  satishes  (8.17)  and  (8.18),  and  is  a  linear  combination  of  the  signal 
vectors  {u^}.  Now,  suppose  that  we  add  a  component  Ac  orthogonal  to  the  space  spanned  by 
the  signal  vectors.  Show  that  C2  =  Ci  -|-  Ac  is  also  a  ZF  correlator. 

(b)  How  is  the  outpnt  noise  variance  for  C2  related  to  that  for  Ci? 

(c)  Gonclude  that,  in  order  to  be  optimal,  a  ZF  correlator  must  lie  in  the  signal  subspace  spanned 
by  {ufc}. 

(d)  Observe  that  the  condition  (8.17)  implies  that  c  must  be  orthogonal  to  the  interference 
subspace  spanned  by  {u^,  k  ^  0}.  Gombining  with  (c),  infer  that  c  must  be  a  scalar  multiple  of 

Pfuo. 


Problem  8.6  (invertibility  requirement  for  computing  ZF  correlator)  The  ZF  correlator 
expression  (8.22)  requires  inversion  of  the  matrix  U^U  of  correlations  among  the  signal  vectors, 
with  (i,  j)th  entry  Uj,Uj)  =  ufuj. 

(a)  Show  that 


U^Ua  =  0 


(8.78) 


if  and  only 

Ua  =  0 


(8.79) 


Hint:  In  order  to  show  that  (8.78)  implies  (8.7^,  snppose  that  (8.78)  holds.  Multiply  by  a^  and 
show  that  you  get  an  expression  of  the  form  x^x  =  ||x|p  =  0,  which  implies  x  =  0. 

(b)  Use  the  result  in  (a)  to  infer  that  U^U  is  invertible  if  and  only  if  the  signal  vectors  {u^}  are 
linearly  independent. 
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Problem  8.7  (Alternative  computation  of  the  ZF  correlator)  An  alternative  compnta- 
tion  for  the  ZF  correlator  is  by  developing  an  expression  for  Pfuo  in  terms  of  uq  and  U/,  a 
matrix  containing  the  interference  vectors  {u^.,  /c  7^  0}  as  columns.  That  is,  U/  is  obtained  from 
the  signal  matrix  U  by  deleting  the  column  corresponding  to  the  desired  vector  uq.  Let  us  de- 
hne  the  projection  of  uq  onto  the  interference  subspace  by  P/Uq.  By  dehnition,  this  is  a  linear 
combination  of  the  interference  vectors,  and  can  be  written  as 


P/Uo  =  U/a/  (8.80) 

The  orthogonal  projection  Pf  uq  is  therefore  given  by 

P]^uo  =  uo  -  P/Uo  =  uo  -  U/a/  (8.81) 

(a)  Note  that  fcPperpUo  must  be  orthogonal  to  each  of  the  interference  vectors  {ufc,  k  7^  0},  hence 
(going  directly  to  the  general  complex-valued  setting) 

Uf  Pf  no  =  0 


(b)  Infer  from  (a)  that 

az=  (UfU,)"'ufuo 

(c)  Derive  the  following  explicit  expression  for  the  orthogonal  projection 


P|uo  =  no  -  U/ (Uf  U/)  'Ufuo 


(8.82) 


(d)  Derive  the  following  explicit  expression  for  the  energies  of  the  projection  onto  the  interference 
subspace  and  the  orthogonal  projection: 


P;Uo|  p  =  Uo^Uz  (Uf  U,)  '  Uf  Uo 

Pfuolp  =  lluolp  -  U^U,  (ufu,)"'  Ufuo  =  (l  -  U,  (UfU,)-'  Uf)  uo 


(8.83) 


(e)  Note  that  (8.82)  and  (8.83),  together  with  (8.27),  give  us  an  expression  for  a  ZF  correlator 
CzF  scaled  such  that  (c2p’,uo)  =  1. 


Problem  8.8  (Analytical  expression  for  MMSE  correlator)  Let  us  derive  the  expression 
(8.34)  for  the  MMSE  correlator  for  the  vector  ISI  model  (8.13).  We  consider  the  general  scenario 
of  complex- valued  symbols  and  signals.  Suppose  that  the  symbols  {&[n]}  in  the  model  are 
uncorrelated,  satisfying 

ElMn]rlm||=  I  2^1  (8-84) 

f  in  —  lb 

We  have 


R  =  E[r[n]r'^[n]] 


E 


(^b[n  -F  k]uk 

k 


+  w[n\)(^b[n  +  l]ui  -Fw[?7,])-^ 

i 


p  =  E[6*[n]r[?7,]]  =  E  b*[n](^2b[n  +  k]uk  -|-w[?7,]) 

k 

Now  use  (8.84),  and  the  independence  of  the  symbols  and  the  noise,  to  infer  that 

R  =  (Tj  ^  Ufcuf  +Cui  ,  p  =  cr^uo 


433 


Problem  8.9  (ZF  correlator  for  colored  noise)  Consider  the  model  (8.13)  where  the  noise 
covariance  is  a  positive  dehnite  matrix  Cw  We  now  derive  the  formula  (8.30)  for  the  ZF  corre¬ 
lator,  by  mapping,  via  a  linear  transformation,  the  system  to  a  white  noise  setting  for  which  we 
have  already  derived  the  ZF  correlator  in  (8.22).  Specifically,  suppose  that  we  apply  an  invertible 
matrix  A  to  the  received  vector  r[?T,],  then  we  obtain  a  transformed  received  vector 


r[n]  =  Ar[n]  =  b[n  +  k]uk  -I-  w[?T,]  (8.85) 

k 

where 

Ufc  =  Aufc  (8.86) 

w[?T,]  =  Aw[n]  ~  A(0,  ACwA"^)  (8.87) 

(a)  Suppose  that  we  find  a  linear  correlator  c  for  the  transformed  system  (8.85),  leading  to  a 
decision  statistic  Z[n]  =  c^f[n].  Show  that  we  can  write  the  decision  statistic  Z[n]  =  c^r[?7,]  (i.e., 
as  the  output  of  a  linear  correlator  operating  on  the  original  received  vector),  where 

c  =  A^c  (8.88) 

(b)  Suppose  that  we  can  find  A  such  that  the  noise  in  the  transformed  system  is  white: 

ACwA^  =  I  (8.89) 

Show  that 

C-i  =  A^A  (8.90) 

(c)  Show  that  the  optimal  ZF  correlator  c  for  the  transformed  system  is  given  by 

CziT  =  U  (8.91) 

(d)  Show  that  the  optimal  ZF  correlator  Czf  in  the  original  system  is  given  by  (8.30). 

Hint:  Use  (8.86),  (8.88)  and  (8.90). 

(e)  While  we  have  used  whitening  only  as  an  intermediate  step  to  deriving  the  formula  (8.30)  for 
the  ZF  correlator  in  the  original  system,  we  note  for  completeness  that  a  whitening  matrix  A 
satisfying  (8.89)  is  guaranteed  to  exist  for  any  positive  definite  Cw,  and  spell  out  two  possible 
choices  for  A.  For  example,  we  can  take  A  =  B“^,  where  B  is  the  square  root  of  Cw,  which 
is  a  symmetric  matrix  satisfying  Cw  =  B^.  Another,  often  more  numerically  stable,  choice  is 
A  =  L“^,  where  L  is  a  lower  triangular  matrix  obtained  by  the  Cholesky  decomposition  of  Cw- 
which  satisfies  Cw  =  LL^.  Matlab  functions  implementing  these  are  given  below: 


7„square  root  of  Cw 

B=sqrtm(Cw)  ;  7„syinmetric  matrix 

yoCholesky  decomposition  of  Cw 

L=chol  (CwC  lower  C  ;  °/olower  triangular  matrix 


Throughout  the  preceding  problem,  replacing  transpose  by  conjugate  transpose  gives  the  cor¬ 
responding  results  for  the  complex-valued  setting.  The  Matlab  code  segment  above  applies  for 
both  real-  and  complex-valued  noise. 
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Problem  8.10  (Effect  of  array  spacing)  Consider  a  regular  linear  array  with  N  elements 
and  inter-element  spacing  d. 

(a)  For  N  =  S,  plot  the  beam  pattern  for  a  beam  directed  at  30°  from  broadside  for  d  =  j. 

(b)  Repeat  (a)  for  d  =  2X. 

(c)  Comment  on  any  differences  that  you  notice  in  the  beamforming  patterns  in  (a)  and  (b). 

(d)  For  inter-element  spacings  beyond  aX,  the  maximum  of  the  beam  pattern  is  not  unique. 
That  is,  the  beam  pattern  takes  its  maximum  value  not  just  in  the  desired  direction,  but  also 
in  a  few  other  directions.  These  other  maxima  are  called  grating  lobes.  What  is  the  value  of  a 
beyond  which  you  expect  to  see  grating  lobes? 


Problem  8.11  (SDMA)  The  base  station  in  a  cellular  network  is  equipped  with  a  linear  array 
with  16  elements  uniformly  spaced  at  A/3.  Consider  two  mobiles.  Mobile  A  is  at  angle  20°  from 
broadside  and  Mobile  B  is  at  angle  —30°  from  broadside. 

wishes  to  simultaneously  send  different  data  streams  to  two  different  mobiles.  Assume  that  it 
has  a  linear  array  with  16  elements  uniformly  spaced  at  A/3.  Mobile  A  is  at  angle  20°  from 
broadside  and  Mobile  B  is  at  angle  —30°  from  broadside. 

(a)  Compute  the  array  responses  corresponding  to  each  mobile,  and  plot  the  beamforming  pat¬ 
terns  if  the  base  station  were  only  communicating  with  one  mobile  at  a  time. 

(b)  Now,  suppose  that  the  base  station  employs  SDMA  using  zero-forcing  interference  suppres¬ 
sion  to  send  to  both  mobiles  simultaneously.  Plot  the  beam  patterns  used  to  send  to  Mobile  A 
and  Mobile  B,  respectively. 

(c)  What  is  the  noise  enhancement  in  (b)  relative  to  (a). 

(d)  Repeat  (a)-(c)  when  Mobile  B  is  at  angle  10°  from  broadside  (i.e.,  closer  to  Mobile  A  in 
angular  spacing).  You  should  notice  a  signihcant  increase  in  noise  enhancement. 

(e)  Try  playing  around  with  different  values  of  angular  spacing  between  mobiles  to  determine 
when  the  base  station  should  attempt  to  use  SDMA  (e.g.,  what  is  the  minimum  angular  spacing 
at  which  the  noise  enhancement  is,  say,  less  than  3  dB). 


Problem  8.12  (Outage  rates  with  receive  diversity)  Consider  a  1  x  iV  MIMO  system  with 
receive  diversity.  The  gain  relative  to  a  SISO  system  is  given  by  (8.64): 

G=\hi\^  +  ...  +  \hN? 

For  our  rich  scattering  model,  hi  ~  CN{0, 1)  are  i.i.d.,  hence  hi\^  are  i.i.d.  exponential  random 
variables,  each  with  mean  one.  We  state  without  proof  that  the  sum  of  N  such  random  variables 
is  a  Gamma  random  variable  with  PDF  and  CDF  given  by 

N-l 

Pcig)  =  ®4>o  (8.92) 

oo  N—1  f, 

FoO)  = /’|G  <  j|  =  li  =  1  -  ,  g>0  (8.93) 

k=N  k=0 

(a)  Use  the  preceding  results  to  compute  the  probability  of  outage  (log  scale)  for  A-fold  receive 
diversity  versus  link  margin  (dB)  relative  to  the  SISO  benchmark  for  N  =  1,2,4.  That  is, 
reproduce  the  results  displayed  in  Figure  8.20. 

(b)  Optional  It  may  be  an  interesting  exercise  to  use  simulations  to  compute  the  empirical  CDF 
of  G,  and  to  check  that  you  get  the  same  outage  rate  curves  as  those  in  (a). 
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Problem  8.13  (Alamouti  scheme  with  multiple  receive  antennas)  Consider  a  2  x  A 
MIMO  system  where  the  transmitter  employs  Alamonti  space-time  coding  as  in  (8.67).  Let 
H  =  (hiha)  denote  the  N  x  2  channel  matrix,  with  {N  x  1)  columns  hi  and  h2 

(a)  Show  that  the  optimal  receiver  is  given  by  (8.70),  where  ui  =  ^(hi,h2)^  and  U2  = 

^(h2,-h^)^,  and 


y  = 


/  y[i] 
V  yl2] 


(b)  Show  that  the  SNR  gain  relative  to  our  unfaded  SISO  system  with  the  same  transmit  power 
and  constant  channel  gain  of  unity  is  given  by 


G 


^  N  2 


Comparing  with  the  receive  diversity  gain  (8.64)  in  a  1  x  A  system,  answer  the  following 
True/False  questions  (give  reasons  for  your  answers). 

(c)  True  or  False:  A  2  x  2  MIMO  system  with  Alamouti  space-time  coding  is  3  dB  better  than 
a  1  X  2  MIMO  system  with  receive  diversity. 

(d)  True  or  False:  A  2  x  2  MIMO  system  with  Alamouti  space-time  coding  is  3  dB  worse  than 
a  1  X  4  MIMO  system  with  receive  diversity. 

(e)  Use  the  approach  in  Problem  8.12  to  compute  and  plot  the  outage  probability  (log  scale)  ver¬ 
sus  link  margin  (dB)  relative  to  the  unfaded  SISO  system  for  a  2  x  A  MIMO  system,  A  =  1,  2, 4. 
You  should  get  a  plot  that  follows  those  in  Figure  8.22. 


Problem  8.14  (Outage  rates  for  spatial  multiplexing  with  ZF  reception)  Consider  two¬ 
fold  spatial  multiplexing  in  a  2  x  A  MIMO  system  with  A  x  2  channel  matrix  H.  Dehne  the 
2x2  matrix  R  =  H^H. 

(a)  Referring  to  the  spatial  multiplexing  model  (8.73),  how  do  the  entries  of  R  relate  to  the 
signal  vectors  ui  and  U2? 

(b)  Show  that  the  energy  of  the  projection  of  ui  orthogonal  to  the  subspace  spanned  by  U2  is 
given  by 


=  R(i,i) 


|R(1,2)P 

R(2,2) 


where  R(f,  j)  denotes  the  (i,  j)th  entry  of  R,  i,j  =  1,2. 

(c)  If  we  £x  the  transmit  power  to  that  of  the  SISO  benchmark  (splitting  it  equally  between  the 
two  data  streams),  show  that  the  gain  seen  by  the  hrst  data  stream  is  given  by 


Gi 


El 

2 


msT] 

R(2,2)  J 


Similarly,  the  gain  seen  by  the  second  data  stream  is  given  by 


G2 


2|R(2.2) 


|R(1,2)P^ 

R(l.l)  ) 


Note  that,  under  our  rich  scattering  model,  Gi  and  G2  are  identically  distributed  random  vari¬ 
ables. 

(c)  Use  computer  simulations  with  the  rich  scattering  model  to  plot  the  outage  rate  versus  link 
margin  for  2x2  and  2x4  MIMO  with  two-fold  spatial  multiplexing  and  ZF  reception.  You 
should  get  a  plot  similar  to  Figure  8.24.  Discuss  how  the  performance  compares  with  that  of  the 
Alamouti  scheme. 
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Problem  8.15  (Outage  rates  for  spatial  multiplexing  with  ZF  reception)  (a)  Argue 
that  2  X  N  Alamouti  space-time  coding  is  exactly  3  dB  worse  than  1  x  2N  receive  diversity. 

(b)  For  2  X  N  spatial  multiplexing  with  ZF  reception,  approximate  its  performance  as  x  dB 
worse  than  a  1  x  A^'  receive  diversity.  (Note  that  spatial  multiplexing  has  twice  the  bandwidth 
efficiency  as  receive  diversity,  but  it  loses  3  dB  of  power  up  front  due  to  splitting  it  between  the 
two  data  streams.) 

Answer:  Approximately  3  dB  worse  than  1  x  (A^  — 1)  receive  diversity.  That  is,  if  the  gain  relative 
to  the  SISO  benchmark  for  a  1  x  A^  receive  diversity  system  is  denoted  as  Grx-div{N),  then  the 
gain  for  a  2  X  A^  spatial  multiplexed  system  is  Gsmux{,N)  \Grx-div{.N  —  l)/2.  Thus,  the  CDF, 
and  hence  outage  rate,  of  the  spatially  multiplexed  system  is  approximated  as 

P[Gsmux{.N)  <X]^  P[Grx-div{N  -  1)  <  2x] 

(c)  Use  the  results  in  (b),  and  the  analytical  framework  in  Problem  8.12,  to  obtain  an  analytical 
approximation  for  the  simulation  results  in  Problem  8.14(c). 


Software  Lab  8.1:  Introduction  to  Equalization  in  Singlecarrier  Sys¬ 
tems 

Reading:  Sections  8. 1-8.2;  Chapter  4  (linear  modulation);  Section  5.6  (Gaussian  random  vari¬ 
ables  and  the  Q  function).  This  lab  can  be  completed  without  systematic  coverage  of  Chapter 
6;  we  state  and  use  probability  of  error  expressions  from  Chapter  6,  but  knowing  how  they  are 
derived  is  not  required  for  the  lab. 

Lab  Objectives:  To  understand  the  need  for  equalization  in  communication  systems,  and  to 
implement  linear  MMSE  equalizers  adaptively. 

Laboratory  Assignment 

0)  Use  as  your  transmit  and  receive  filters  the  SRRC  pulse  employed  in  Software  Labs  4.1  and 
6.1.  Putting  the  code  for  realizing  these  together  with  the  code  fragments  developed  in  this 
chapter  provides  the  code  required  for  this  lab.  As  in  Software  Labs  4.1  and  6.1,  the  transmit, 
channel,  and  receive  hlters  are  implemented  at  rate  4/T.  For  simplicity,  we  consider  BPSK 
signaling  throughout  this  lab,  and  consider  only  real-valued  signals.  Generate  nsymbols  = 
ntraining  +  npayload  (numbers  to  be  specihed  later)  ±1  BPSK  symbols  as  in  Lab  6.1,  and  pass 
them  through  the  transmit,  channel,  and  receive  hlters  to  get  noiseless  received  samples  at  rate 
4/T. 

1)  Let  us  start  with  a  trivial  channel  hlter  as  before.  Set  nsymbols  =  200.  The  number  of  rate 
4/T  samples  at  the  output  of  the  receive  hlter  is  therefore  800,  plus  tails  at  either  end  because 
the  length  of  the  ehective  pulse  modulating  each  symbol  extends  over  multiple  symbol  intervals. 
Plot  an  eye  diagram  (e.g.,  using  code  fragment  8.1.3)  using,  say,  400  samples  in  the  middle.  You 
should  get  an  eye  diagram  that  looks  like  Figure  8.26:  the  cascade  of  the  transmit  and  receive 
hlter  is  approximately  Nyquist,  and  the  eye  is  open,  so  that  we  can  hnd  a  sampling  time  such 
that  we  can  distinguish  between  -|-1  and  -1  well,  despite  the  inhuence  of  neighboring  symbols. 

2)  Now  introduce  a  non-trivial  channel  hlter.  In  particular,  consider  a  channel  hlter  specihed  (at 
rate  4/T)  using  the  following  matlab  command: 

channeUilter  =  [-0.7,  -0.3,  0.3,  0.5, 1,  0.9,  0.8,  -0.7,  -0.8,  0.7,  0.8,  0.6,  0.3]'; 

Generate  an  eye  diagram  again.  You  should  get  something  that  looks  like  Figure  8.27.  Notice 
now  that  there  is  no  sampling  time  at  which  you  can  clearly  make  out  the  diherence  between 
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Figure  8.26:  Eye  diagram  for  a  non- dispersive  channel.  The  eye  is  open. 


Figure  8.27:  Eye  diagram  for  a  dispersive  channel.  The  eye  is  closed. 
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+1  and  -1  symbols.  The  eye  is  now  said  to  be  closed  due  to  ISI,  so  that  we  cannot  make  symbol 
decisions  just  by  passing  appropriately  timed  received  samples  through  a  thresholding  device. 

3)  We  are  now  going  to  evaluate  probability  of  error  without  and  with  equalization.  First,  let 
us  generate  the  noisy  output  of  the  receive  hlter.  We  need  to  generate  nsymbols  =  ntraining  + 
npayload  (numbers  to  be  specihed  later)  ±1  BPSK  symbols  as  in  Software  Lab  6.1,  and  pass  them 
through  the  transmit  hlter,  the  dispersive  channel,  and  the  receive  hlter  to  get  noiseless  received 
samples  at  rate  4/T.  Since  we  are  signaling  along  the  real  axis  only,  at  the  input  to  the  receive 
hlter,  add  iid  iV(0,  real-valued  noise  samples  (as  in  lab  2,  choose  ^  corresponding  to 
a  specihed  value  of  ^).  Pass  these  (rate  4/T)  noise  samples  through  the  receive  hlter,  and  add 
the  result  to  the  signal  contribution  at  the  receive  hlter  output. 

4)  Performance  without  equalization:  Let  {vk}  denote  the  output  of  the  receive  hlter,  and 
let  Z[n]  =  rd+4(n-i),  n  =  1,2, nsymbols  denote  the  best  symbol  rate  decision  statistics  you 
can  obtain  by  subsampling  at  rate  1/T  the  receive  hlter  output.  As  in  the  solutions  to  earlier 
labs,  choose  the  decision  delay  d  equal  to  the  location  of  the  maximum  of  the  overall  response 
(which  now  includes  the  channel)  to  a  single  symbol.  For  nsymbols  =  10100,  compute  the  error 
probability  of  the  decision  rule  b[n]  =  sign(Z[?7,])  as  a  function  of  Eb/No,  where  the  latter  ranges 
from  5  to  20  dB.  Compare  with  the  ideal  error  probability  curve  for  BPSK  signaling  for  the  same 
range  of  Eb/No.  This  establishes  that  a  simple  one-sample  per  symbol  decision  rule  does  not 
work  well  for  non-ideal  channels,  and  motivates  the  equalization  schemes  discussed  below. 

Linear  equalization:  We  now  consider  linear  equalization,  where  the  decision  for  symbol  is 
based  on  linear  processing  of  a  vector  of  samples  r[?7,]  of  length  L  =  2M  +  1,  where  the  entries  of 
r[n]  are  samples  spaced  by  T/g,  with  the  center  sample  being  the  same  as  the  decision  statistic 
in  part  3:  g  =  1  corresponds  to  symbol-spaced  sampling,  and  g  >  1  corresponds  to  fractionally 
spaced  sampling.  We  consider  two  cases:  g  =  1  and  g  =  2. 

r[7r]  {rkJ,-ii(ji—\^j^bL—(A/q)MT'^k+A(n—l)+d—(4:/q){M—l)i---i 

f'k+A{n—l)+d-i  '^k+A{n—l)+d+{A/q)i 
•  •  • )  k+A{n—l)+d+{A/ q)M ) 

The  decision  rule  we  use  is 

b  =  sign(c'^r[n])  (8.94) 

where  c  is  a  correlator  whose  choice  is  to  be  specihed.  Note  that  the  decision  rule  in  part  3 
corresponds  to  the  choice  c  =  (0, ..,  0, 1,  0, ...,  0)^,  since  it  uses  only  the  center  sample. 

The  vector  of  samples  r[n]  contains  contributions  from  both  the  desired  symbol  and  from  ISI 
due  to  bn±i,  bn±2,  etc.  We  implement  the  linear  minimum  mean  squared  error  (MMSE)  equalizer 
using  a  least  squares  adaptive  implementation,  as  discussed  in  Section  8.2.1. 

5)  For  the  least  squares  implementation,  assume  that  the  hrst  ntraining  symbols  are  known 
training  symbols,  bi, ...,  bntraining-  Dehne  the  L  x  L  matrix 


R 


1 

ntraining 


ntraining 

r[?7,]r'^[n] 

n=l 


and  the  T  x  1  vector 


P  = 


1 

ntraining 


ntraining 

6[n]r[?7,] 

n=l 


The  MMSE  correlator  is  now  approximated  as 


CmMSE  —  (R) 


(8.95) 
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6)  Now,  the  correlator  obtained  via  (8.8)  is  used  to  make  decisions,  using  the  decision  rule  (8.94), 
on  the  unknown  symbols  n  =  ntraining  +  1,  ...,nsymbols. 

7)  Fix  ntraining  =  100  and  npayload  =  10000.  For  L  =  3,5,  7,  9,  and  q  =  1,2,  implement  linear 
MMSE  equalizers,  and  plot  their  error  probabilities  (for  the  payload  symbols)  as  a  function  of 
Eb/No,  in  the  range  5  to  20  dB.  to  the  unequalized  error  probability  and  the  ideal  error  probability 
found  in  part  3. 

Hint:  An  efficient  way  to  generate  the  statistics  c^r[?7,]  is  to  pass  an  appropriate  rate  2/T 
subsequence  of  the  receive  hlter  output  through  a  hlter  whose  impulse  response  is  the  time 
reverse  of  c,  and  to  then  appropriately  subsample  at  rate  1/T  the  output  of  the  equalizing  hlter. 
This  is  much  faster  than  correlating  c  with  r[n]  for  each  n. 

8)  Comment  on  the  performance  of  symbol-spaced  versus  fractionally  spaced  equalization.  Com¬ 
ment  on  the  effect  of  equalizer  length  of  performance.  What  is  the  effect  of  increasing  or  de¬ 
creasing  the  training  period? 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order.  Describe 
the  reasoning  you  used  and  the  difficulties  you  encountered. 


Software  Lab  8.2:  Simplified  Simulation  Model  for  a  OFDM  link 

Reading:  Section  8.3;  Chapter  4  (linear  modulation). 

Lab  Objectives:  To  develop  a  hands-on  understanding  of  basic  OFDM  transmission  and  re¬ 
ception. 


Laboratory  Assignment 


We  would  like  to  leverage  the  code  from  Software  Lab  8.1  as  much  as  possible,  so  we  set  the  DAC 
hlter  to  be  the  transmit  hlter  in  that  lab,  and  the  receive  hlter  to  its  matched  hlter  as  before. 
The  main  diherence  is  that  the  time  domain  samples  sent  through  the  DAC  hlter  are  obtained 
by  taking  the  inverse  FFT  of  the  frequency  domain  symbols,  and  inserting  a  cyclic  prehx.  We 
hx  the  constellation  as  Gray  coded  QPSK. 

Step  1  (Exploring  time  and  frequency  domain  relationships  in  OFDM):  Let  us  hrst 
discuss  the  structure  of  a  single  “OFDM  symbol,”  which  carries  N  complex-valued  symbols  in 
the  frequency  domain.  Here  N  is  the  number  of  subcarriers,  chosen  to  be  a  power  of  2.  Set  L  to 
be  length  of  the  cyclic  prehx.  Set  N  =  256,  L  =  20  for  these  initial  explorations,  but  keep  the 
parameters  programmable  for  later  use. 

la)  Generate  N  Gray  coded  QPSK  symbols  B  =  {B[k],  k  =  1, N}.  (You  can  use  the  function 
qpskmap  developed  in  Software  Lab  6.1  for  this  purpose.)  Take  the  inverse  FFT  to  obtain  time 
domain  samples  b  =  {b[n],n  =  1, 

lb)  Append  the  last  L  time  domain  samples  to  the  beginning,  to  get  a  length  N  +  L  sequence 
of  time  domain  samples  b'  =  {b'[n],n  =  1,  ...,N  -|-  L}.  That  is,  b'[l]  =  b[N  —  L  -|-  1],  ...,b'[L]  = 
b[N],  b'[L  +  1]  =  b[l],  ...,b'[N  +  L]=  b[N]. 

lc)  Take  the  hrst  N  symbols  of  b',  say  ri  =  {fe'[l], ...,  6'[Y]}.  Show  that  the  FFT  output  (say 
Ri  =  {i?i[/c]})  is  related  to  the  original  frequency  domain  symbols  B  through  a  frequency  domain 
channel  H  as  follows:  Ri[k]  =  H[k]B[k].  Find  and  plot  the  amplitude  \H[k]  \  and  phase  arg{H[k]) 
versus  k. 

ld)  Repeat  Ic)  for  the  time  domain  samples  {&'[3], ...,  6'[Y  -|-  3]}  (i.e.,  skip  the  hrst  two  samples 
of  b'.  How  are  the  frequency  domain  channels  in  Ic)  and  Id)  related?  (What  we  are  doing  here 
is  exploring  what  cyclic  shifts  in  the  time  domain  do  in  the  frequency  domain.) 

Step  2  (generating  multiple  OFDM  symbols):  Now,  we  generate  K  frames,  each  carrying 
N  Gray  coded  QPSK  symbols.  Set  N  =  256,  L  =  20,  K  =  5  for  numerical  results  and  plots  in 
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this  step. 

2a)  For  each  frame,  generate  time  domain  samples  and  add  a  cyclic  prefix,  as  in  Steps  la)  and 
lb).  Then,  append  the  time  domain  samples  for  snccessive  frames  together.  We  now  have  a 
stream  of  K{N  +  L)  time  domain  samples,  analogons  to  the  time  domain  symbols  sent  in  Lab  3. 
2c)  Pass  the  time  domain  symbols  throngh  the  same  transmit  filter  (this  is  the  DAC  in  Fignre 
8.9)  as  in  Software  Labs  4.1,  6.1  and  8.1,  again  oversampling  by  a  factor  of  4.  (That  is,  if  the 
time  domain  samples  are  at  rate  l/T^,  the  filter  is  implemented  as  rate  d/T^.  This  gives  ns  a 
rate  d/T^  transmitted  signal. 

2d)  Compnte  the  peak  to  average  power  ratio  (PAPR)  in  dB  for  this  transmitted  signal  (OFDM 
is  notorions  for  having  a  large  PAPR).  This  is  done  by  taking  the  ratio  of  the  maximnm  to 
average  valne  of  the  magnitnde  sqnared  of  the  time  domain  samples. 

2e)  Note  that  the  original  QPSK  symbols  in  the  freqnency  domain  have  a  PAPR  of  one,  bnt  the 
time  domain  samples  are  generated  by  mixing  these  together.  The  time  domain  samples  conld 
be  expected,  therefore,  to  have  a  Gaussian  distribution,  invoking  the  central  limit  theorem.  Plot 
a  histogram  of  the  I  and  Q  components  from  the  time  domain  samples.  Do  they  look  Gaussian? 
2f)  As  in  Lab  2,  assume  an  ideal  channel  filter  and  pass  the  transmitted  signal  through  a  receive 
hlter  matched  to  the  transmit  filter.  This  gives  a  rate  d/T^  noiseless  received  signal. 

2g)  Subsample  the  received  signal  at  rate  l/T^,  starting  with  a  delay  of  d  samples  (play  around 
and  see  what  choice  of  d  works  well-perhaps  based  on  the  peak  of  the  cascade  of  the  transmit, 
channel  and  receive  hlter).  The  hrst  N  samples  corresponding  to  the  hrst  frame.  Take  the  FFT 
of  these  N  samples  to  get  Now,  estimate  the  frequency  domain  channel  coefhcients 

{H[k]}  by  using  the  known  transmitted  symbols  Bi[k]  in  the  hrst  frame  as  training.  That  is, 

H[k]  =  Ri[k]/Bi[k] 

Plot  the  magnitude  and  phase  of  the  channel  estimates  and  comment  on  how  it  compares  to 
what  you  saw  in  Ic)  and  Id). 

2h)  Now,  use  the  channel  estimate  from  2g)  to  demodulate  the  succeeding  frames.  If  frame  m 
uses  time  domain  samples  over  a  window  [a,  b],  then  frame  m  +  1  uses  time  domain  samples  over 
a  window  [a  +  (A^  +  L)Ts,  b  +  (A^  +  L)Ts].  Denoting  the  FFT  of  the  time  domain  samples  for 
frame  m  as  Rm[k],  the  decision  statistics  for  the  frequency  domain  symbols  for  the  mth  frame 
are  given  by 

B„,[k]  =  H*[k]R^[k] 

You  can  now  decode  the  bits  and  check  that  you  get  a  BER  of  zero  (there  is  no  noise  so  far). 
Also,  display  scatter  plots  of  the  decision  statistics  to  see  that  you  are  indeed  seeing  a  QPSK 
constellation  after  compensating  for  the  channel. 

Step  3  (Channel  compensation):  We  now  introduce  a  nontrivial  channel  (still  no  noise). 
Increase  the  cyclic  prehx  length  if  needed  (it  should  be  long  enough  to  cover  the  cascade  of  the 
transmit,  channel  and  receive  hlters.  But  remember  that  the  cyclic  prehx  is  at  rate  l/Tg,  whereas 
the  hlter  cascade  is  at  rate  d/T^. 

3a)  Repeat  Step  2f,  but  now  with  a  nontrivial  channel  hlter  modeled  at  rate  d/T^.  Use  the 
channels  you  have  tried  out  in  Lab  3  (still  no  noise).  For  example: 

channel_hlter  =  [-0.7,  -0.3,  0.3,  0.5, 1,  0.9,  0.8,  -0.7,  -0.8,  0.7,  0.8,  0.6,  0.3]'; 

3b)  Repeat  Step  2g.  Gomment  on  how  the  magnitude  and  phase  of  the  frequency  domain  channel 
dihers  from  what  you  saw  in  2g,  Ic  and  Id. 

3c)  Repeat  Step  2h.  Gheck  that  you  get  a  BER  of  zero,  and  that  your  decision  statistics  give 
nice  QPSK  scatter  plots. 

3d)  Gheck  that  everything  still  works  out  as  you  vary  the  number  of  subcarriers  N  (e.g.,  N  = 
512, 1024,  2048),  the  cyclic  prehx  length  L  and  the  number  of  frames  K. 

Step  4  (Effect  of  noise)  :  Now,  add  noise  as  in  Software  Labs  6.1  and  8.1.  Specihcally,  at  the 
input  to  the  receive  hlter,  add  independent  and  identically  distributed  (iid)  complex  Gaussian 
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noise,  such  that  the  real  and  imaginary  part  of  each  sample  are  iid  iV(0,  a^)  (we  choose  ^ 
corresponding  to  a  specihed  value  of  ^).  Let  us  £x  iV  =  1024  for  concreteness,  and  set  the  cyclic 
prehx  to  just  a  little  longer  than  the  minimum  required  for  the  channel  you  are  considering.  Set 
the  number  of  frames  to  K  =  10.  Try  a  couple  of  values  of  Eh/No  of  5  dB  and  8  dB. 

4a)  While  you  can  estimate  Eh  analytically,  estimate  it  by  taking  the  energy  of  the  transmitted 
signal  in  3a,  and  dividing  it  by  the  number  of  bits  in  the  payload  (i.e.,  excluding  the  first  frame). 
Use  this  to  set  the  value  of  Nq  for  generating  the  noise  samples. 

4b)  Pass  the  (rate  4/Ts)  noise  samples  through  the  receive  Liter,  and  add  the  result  to  the  output 
of  part  3a. 

4c)  Consider  Lrst  a  noiseless  channel  estimate,  in  which  you  carry  out  Step  3b  (estimating  the 
channel  based  on  frame  1)  before  you  add  noise  to  the  output  of  3a.  Now  add  the  noise  and 
carry  out  Step  3c  (demodulating  the  other  frames).  Estimate  the  BER  and  compare  with  the 
analytical  value  for  ideal  QPSK.  Show  the  scatter  plots  of  the  decision  statistics. 

4d)  Repeat  4c,  except  that  you  now  estimate  the  channel  based  on  frame  1  after  adding  noise. 
Discuss  how  the  BER  degrades.  Compare  the  channel  estimates  from  parts  4c  and  4d  on  the 
same  plot. 

Note:  You  may  notice  a  signiLcant  BER  degradation,  but  that  is  because  the  channel  estimation 
technique  is  naive  (the  channel  coefficients  for  neighboring  subcarriers  are  highly  correlated,  but 
our  estimate  is  not  exploiting  this  property).  Exploring  better  channel  estimation  techniques  is 
beyond  the  scope  of  this  lab,  but  you  are  encouraged  to  browse  the  literature  on  OFDM  channel 
estimation  to  dig  deeper. 

Step  5  (Consolidation):  Once  you  are  happy  with  your  code,  plot  the  BER  (log  scale)  as  a 
function  of  Eh/N^  (dB)  for  the  channel  in  Lab  3.  Plot  three  curves:  ideal  QPSK,  OFDM  with 
noiseless  channel  estimation,  OFDM  with  noisy  channel  estimation.  Comment  on  the  relation 
between  the  curves. 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 


Software  Lab  8.3:  MIMO  signal  processing 

Reading:  Section  8.4 

Lab  Objectives:  To  gain  hands-on  exposure  to  basic  MIMO  signal  processing  at  the  transmitter 
and  receiver. 


Laboratory  Assignment 

Background:  Consider  the  rich  scattering  model  for  a  single  subcarrier  in  a  MIMO-OFDM 
system  with  M  transmit  and  N  receive  antennas.  The  N  x  M  channel  matrix  H  is  modeled  as 
having  i.i.d.  0^(0,!)  entries. 

Code  Fragment  8.7.1  (MIMO  matrix  with  i.i.d.  complex  Gaussian  entries) 

7oM,N  specified  earlier 

7oMIMO  matrix  with  iid  CN(0,1)  entries 

H= (randn (N , M) + j  *randn (N , M) ) /sqrt (2) ; 

Let  T  denote  the  number  of  time  domain  samples  for  our  system.  Let  Xi[t]  denote  the  sam¬ 
ple  transmitted  from  transmit  antenna  i  at  time  t,  where  1  <  i  <  M  and  1  <  t  <  T.  Let 
x[f]  =  {xi[t],  ...,XM[t])'^  denote  the  M  x  1  vector  of  samples  transmitted  at  time  t,  and  let 
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X  =  (x[l],  ...,x[T])  denote  the  M  x  T  matrix  containing  all  the  transmitted  samples.  Our  con¬ 
vention  is  to  normalize  the  net  transmit  power  to  one,  so  that  For  a  single  input 

single  output  (SISO)  system,  this  would  lead  to  an  average  received  SNR  of  SNR  =  since  the 
magnitude  squared  of  the  channel  gain  is  normalized  to  one,  and  the  noise  per  receive  antenna 
is  modeled  as  CN{0,2a^),  and  we  vary  this  hypothetical  SISO  system  SNR  when  evaluating 
performance. 

The  N  X  T  received  matrix  Y,  with  yj[t],  1  <  J  <  Y  denoting  the  spatial  vector  of  received 
samples  at  time  t,  is  then  modeled  as 

Y  =  HX  +  N 

where  N  is  an  Y  x  T  matrix  with  i.i.d.  OY(0,2cr^)  entries.  This  model  is  implemented  in  the 
following  code  fragment. 

Code  Fragment  8.7.2  (Received  signal  in  MIMO  system) 

7„snrbardb  specified  earlier 
7oexpress  snrbar  in  linear  scale 
snrbar  =  10~ (snrbardb/10) ; 

7ofind  noise  variance,  assuming  TX  power  =  1 
7o  (snrbar  =  l/(2*sigma"2) ) 
sigma  =  sqrt(l/(2*snr)) ; 

7oX  =  MxT  vector  of  symbols,  already  specified  earlier 
7o (normalized  to  unit  power  per  time) 

7.7oRECE1VED  signal  MODEL:  N  x  T  matrix 
y=H*x  +  sigma*randn(N,T)+j*sigma*randn(N,T) ; 

In  order  to  use  the  preceding  generic  code  fragments  for  a  particular  MIMO  scheme,  we  must 
(a)  map  the  transmitted  symbols  into  the  matrix  X  of  transmitted  samples,  and  (b)  process  the 
matrix  Y  of  received  samples  appropriately. 

Alamouti  space-time  code:  We  hrst  consider  the  Alamouti  space-time  code  for  a  2  x  I  MIMO 
system.  The  transmitted  samples  can  be  generated  using  the  following  code  fragment. 

Code  Fragment  8.7.3  (Transmitted  samples  for  Alamouti  space-time  code) 

7oassume  number  of  time  samples  T  has  been  specified 
7oQPSK  symbols  normalized  to  unit  power  per  symbol 

symbols=  (sign(rand(l ,T)  -  0.5)+j*sign(rand(l,T)  -  0.5))/sqrt(2) ; 

7oAlamouti  space-time  code  mapping 
X=zeros  (2  ,T)  ;  7oM=2 

X(1,1:2:T)  =  symbols (1 : 2 : T) ;  7oOdd  samples  from  antenna  1 
X(2,1:2:T)  =  symbols (2 : 2 : T) ; 7oOdd  samples  from  antenna  2 
X(1,2:2:T)  =  -conj  (symbols (2 : 2 : T) ) ; 7oeven  samples  from  antenna  1 
X(2,2:2:T)=  conj  (symbols (1 : 2 : T) )  ;  7oeven  samples  from  antenna  2 


Step  1:  Consider  a  2  x  1  MIMO  system.  Setting  M  =  2,  Y  =  1  and  SNR  at  10  dB,  put  code 
fragments  8.7.1,  8.7.3,  and  8.7.2  together  to  model  the  transmitted  and  received  matrices  X  and 
Y.  Setting  T  =  100,  do  a  scatter  plot  of  the  real  and  imaginary  parts  of  the  received  samples. 
The  received  samples  should  be  smeared  out  over  the  complex  plane,  since  the  signals  from  the 
two  transmit  antennas  interfere  with  each  other  at  the  receive  antenna. 

Step  2:  Compute  the  decision  statistics  (8.70)  based  on  the  received  matrix  Y.  You  may  use 
the  following  code  fragment,  but  you  must  explain  what  it  is  doing.  Do  a  scatter  plot  of  the 
decision  statistics.  You  should  recover  the  noisy  QPSK  constellation. 
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Code  Fragment  8.7.4  (Receiver  processing  for  Alamouti  space-time  code  for  a  2  x  1 
MIMO  system) 

Ytilde  =  zeros(2,T/2)  ;  "/oassume  T  even 
YtildeCl, :)  =  Y(1,1:2:T) ; 

Ytilde(2,:)  =  conj (Y(1,2:2:T)) ; 

yoUl=  ;conj(H(l,2))]  ;  u2  =  [H(l ,  2)  ; -conj  (H(l ,  1)  )  ]  ; 

ul=  [H(l,l);conj(H(l,2))] ;  u2  =  [H(l , 2) ; -conj (H(l , 1) )] ; 

Z(1:2:T)  =  ul'*Ytilde; 

Z(2:2:T)  =  u2'*Ytilde; 

Step  3:  Repeat  Steps  1  and  2  for  a  few  different  realizations  of  the  channel  matrix.  The  quality 
of  the  scatter  plot  in  Step  2  should  depend  on  the  G  =  +\h(i,2)\  ^ 

Step  4  :  Now,  suppose  that  we  use  the  Alamouti  space-time  code  for  a  2  x  iV  MIMO  system, 
where  N  may  be  larger  than  one.  Show  that  only  the  receiver  processing  code  fragment  8.7.4 
needs  to  be  modihed  (other  than  changing  the  value  of  N  in  the  other  code  fragments),  with 
Y  having  dimension  2N  x  and  ui,  U2  each  having  dimension  2N  x  1.  Implement  these 
modifications  and  do  a  scatter  plot  of  the  decision  statistics  for  N  =  2  and  iV  =  4,  fixing  the 
equivalent  SISO  SNR  to  10  dB.  You  should  notice  a  qualitative  improvement  with  increasing  N 
as  you  run  several  channel  matrices,  although  the  plots  depend  on  the  channel  realization. 

Hint:  See  Problem  8.13. 

We  now  consider  spatial  multiplexing  in  a  2  x  A  MIMO  system.  We  can  now  send  2T  symbols 
over  T  time  intervals,  as  in  the  following  code  fragment. 

Code  Fragment  8.7.5  (Transmitted  samples  for  two-fold  spatial  multiplexing) 

7oQPSK  symbols  normalized  to  unit  power 

symbols=  (sign(rand(l , 2*T)  -  0 . 5)+j*sign(rand(l ,2*T)  -  0.5))/sqrt(2) ; 

x=zeros (M,T) ; 

7„normalize  samples  so  as  emit  unit  power  per  unit  time 
x(l,:)  =  symbols (1 :2:2*T)/sqrt (2) ; 
x(2,:)  =  symbols (2:2:2*T)/sqrt (2) ; 


Step  5:  Setting  M  =  2,  A  =  4  and  SNR  at  10  dB,  put  code  fragments  8.7.1,  8.7.5,  and  8.7.2 
together  to  model  the  transmitted  and  received  matrices  X  and  Y.  Setting  T  =  100,  again  do 
a  scatter  plot  of  the  real  and  imaginary  parts  of  the  received  samples  for  each  received  antenna. 
As  before,  the  received  samples  should  be  smeared  out  over  the  complex  plane,  since  the  signals 
from  the  two  transmit  antennas  interfere  with  each  other  at  the  receive  antennas. 

Step  6  :  Now,  apply  a  ZF  correlator  as  in  (8.74)  to  Y  to  separate  the  two  data  streams.  Do 
scatter  plots  of  the  two  estimated  data  streams.  You  should  recover  noisy  QPSK  constellations. 

Step  7:  Fixing  SNR  at  10  dB  and  hxing  a  2  x  4  channel  matrix,  compare  the  scatter  plots 
of  the  decision  statistics  for  the  Alamouti  scheme  with  those  for  two-fold  spatial  multiplexing. 
Which  ones  appear  to  be  cleaner? 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 
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Epilogue 


We  conclude  with  a  brief  discussion  of  research  and  development  frontiers  in  communication 
systems.  This  discussion  is  speculative  by  its  very  nature  (it  is  difficult  to  predict  progress 
in  science  and  technology)  and  is  signihcantly  biased  by  the  author’s  own  research  experience. 
There  is  no  attempt  to  be  comprehensive.  The  goal  is  to  highlight  a  few  of  the  exciting  challenges 
in  communication  systems  in  order  to  stimulate  the  reader  to  explore  further. 


The  Continuing  Wireless  Story 

The  growth  of  content  on  the  Internet  continues  unabated,  driven  by  applications  such  as  video- 
on-demand,  online  social  networks,  and  online  learning.  At  the  same  time,  there  have  been 
signihcant  advances  in  the  sophistication  of  mobile  devices  such  as  smart  phones  and  tablet 
computers,  which  greatly  enhance  the  quality  of  the  content  these  devices  can  support  (e.g., 
smart  phones  today  provide  high-quality  displays  for  video  on  demand).  As  a  result,  users 
increasingly  expect  that  Internet  content  is  ubiquitously  and  seamlessly  available  on  their  mobile 
device.  This  means  that,  even  after  the  runaway  growth  of  cellular  and  WiFi  starting  in  the 
1990s,  wireless  remains  the  big  technology  story.  Mobile  operators  today  face  the  daunting  task 
of  evolving  networks  originally  designed  to  support  voice  into  broadband  networks  supplying  data 
rates  of  the  order  of  10s  of  Mbps  or  more  to  their  users.  By  some  estimates,  it  requires  a  1000-fold 
increase  in  cellular  network  capacity  in  urban  areas!  On  the  other  hand,  since  charging  by  the 
byte  is  not  an  option,  this  growth  in  capacity  must  be  accomplished  in  an  extremely  cost-effective 
manner,  which  demands  signihcant  technological  breakthroughs. 

At  the  other  end  of  the  economic  spectrum,  cellular  connectivity  has  reached  the  remotest  corners 
of  this  planet,  with  even  basic  voice  and  text  messaging  transforming  lives  in  developing  nations 
by  providing  access  to  critical  information  (e.g.,  enabling  farmers  to  obtain  timely  information  on 
market  prices  and  weather).  The  availability  of  more  sophisticated  mobile  devices  implies  that 
ongoing  revolutionary  developments  in  online  education  and  healthcare  can  reach  underserved 
populations  everywhere,  as  long  as  there  is  adequate  connectivity  to  the  Internet.  The  lack  of 
such  connectivity  is  commonly  referred  to  as  the  digital  divide. 

Wireless  researchers  now  face  the  challenge  of  building  on  the  great  expectations  created  by  the 
success  of  the  technologies  they  have  created.  At  one  end,  how  can  we  scale  cellular  network 
capacity  by  several  orders  of  magnitude,  in  order  to  address  the  exponential  growth  in  demand 
for  wireless  data  created  by  smart  mobile  devices?  At  the  other  extreme,  how  do  we  close  the 
digital  divide,  ensuring  that  even  the  most  remote  regions  of  our  planet  gain  access  to  the  wealth 
of  information  available  online?  In  addition,  there  are  a  number  of  specialized  applications  of 
wireless  that  may  assume  significant  importance  as  time  evolves. 

We  summarize  some  key  concepts  driving  this  continuing  technology  story  in  the  following. 

Small  cells:  There  are  two  fundamental  approaches  to  scaling  up  data  rates:  increasing  spatial 
reuse  (i.e.,  using  the  same  time-bandwidth  resources  at  locations  that  are  far  enough  apart), 
and  increasing  communication  bandwidth.  Decreasing  cell  sizes  from  macrocells  with  diameters 


445 


of  the  order  of  kilometers  to  picocells  with  diameters  of  the  order  of  100-200  meters  increases 
spatial  reuse,  and  hence  potentially  the  network  capacity,  by  two  orders  of  magnitude.  Picocel- 
lular  base  stations  may  be  opportunistically  deployed  on  lampposts  or  rooftops,  and  see  a  very 
different  propagation  and  interference  environment  from  macrocellular  base  stations  carefully 
placed  at  elevated  locations.  Interference  among  adjacent  picocells  becomes  a  major  bottleneck, 
as  does  the  problem  of  handing  off  rapidly  moving  users  as  they  cross  cell  boundaries  (indeed, 
cell  boundaries  are  difficult  to  even  dehne  in  picocellular  networks  due  to  the  complexity  of 
below-rooftop  propagation).  Thus,  it  is  important  to  rethink  the  design  philosophy  of  tightly 
controlled  deployment  and  operation  in  today’s  macrocellular  networks.  The  scaling  and  organic 
growth  of  picocellular  networks  is  expected  to  require  a  signihcantly  greater  measure  of  decentral¬ 
ized  self-organization,  including,  for  example,  auto-conhguration  for  plug-and-play  deployment, 
decentralized  coordination  for  interference  and  mobility  management,  and  automatic  fault  de¬ 
tection  and  self-healing.  Another  critical  issue  with  small  cells  is  backhaul  (i.e.,  connecting  each 
base  station  to  the  wired  Internet):  pulling  optical  hber  to  every  lamppost  on  which  a  picocel¬ 
lular  base  station  is  deployed  may  not  be  feasible.  Finally,  we  can  go  to  even  smaller  cells  called 
femtocells,  with  base  stations  typically  deployed  indoors,  in  individual  homes  or  businesses,  and 
using  the  last  mile  broadband  technology  already  deployed  in  such  places  for  backhaul.  For  both 
picocells  and  femtocells,  it  is  important  to  devise  efficient  techniques  for  sharing  spectrum,  and 
managing  potential  interference,  with  the  macrocellular  network.  In  essence,  we  would  like  to 
be  able  to  opportunistically  deploy  base  stations  as  we  do  WiFi  access  points,  but  coordinate 
just  enough  to  avoid  the  tragedy  of  the  commons  resulting  from  the  purely  selhsh  behavior  in 
unmanaged  WiFi  networks.  Of  course,  as  we  learn  more  about  how  to  scale  such  self-organized 
cellular  networks,  we  might  be  able  to  apply  some  of  the  ideas  to  promote  peaceful  coexistence 
in  densely  deployed  and  independently  operated  WiFi  networks  using  unlicensed  spectrum.  In 
short,  it  is  fair  to  say  that  there  is  a  clear  opportunity  and  dire  need  for  signihcant  innovations  in 
overall  design  approach  as  well  as  specihc  technological  breakthroughs,  in  order  to  truly  attain 
the  potential  of  “small  cells.” 

Millimeter  wave  eommunieation:  While  commercial  wireless  networks  deployed  today  employ 
bands  well  below  10  GHz,  there  is  signihcant  interest  in  exploring  higher  carrier  frequencies,  where 
there  are  vast  amounts  of  spectrum.  Of  particular  interest  are  millimeter  (mm)  wave  frequencies 
from  30-300  GHz,  corresponding  to  wavelengths  from  10  mm  down  to  1  mm.  Historically, 
RF  front  end  technology  for  these  bands  has  been  expensive  and  bulky,  hence  there  was  limited 
commercial  interest  in  using  them.  This  has  changed  in  recent  years,  with  the  growing  availability 
of  low-cost  silicon  radio  frequency  integrated  circuits  (RFIGs)  in  these  bands.  The  particular  slice 
of  spectrum  that  has  received  the  most  attention  is  the  60  GHz  band  (from  57-64  GHz).  Most  of 
this  band  is  unlicensed  worldwide.  The  availability  of  7  GHz  of  unlicensed  spectrum  (vastly  more 
than  the  bandwidth  in  current  cellular  and  WiFi  systems)  opens  up  the  possibility  for  another 
revolution  in  wireless  communication,  with  links  operating  at  multiples  of  Gigabits  per  second 
(Gbps).  Potential  applications  of  60  GHz  in  particular,  and  mm  wave  in  general,  include  order  of 
magnitude  increases  in  the  data  rates  for  indoor  wireless  networks,  multiGbps  wireless  backhaul 
networks,  and  base  station  to  mobile  links  in  picocells,  and  even  wireless  data  centers.  However, 
realizing  the  vision  of  multiGbps  wireless  everywhere  is  going  to  take  some  work.  While  we  can 
draw  upon  the  existing  toolkit  of  ideas  developed  for  wireless  communication  to  some  extent, 
we  may  have  to  rethink  many  of  these  ideas  because  of  the  unique  characteristics  of  mm  wave 
communication.  The  latter  largely  follow  from  the  order  of  magnitude  smaller  carrier  wavelength 
relative  to  existing  wireless  systems. 

At  the  most  fundamental  level,  consider  propagation  loss.  As  discussed  in  Section  6.5  (see  also 
Problems  6.37  and  6.38),  the  propagation  loss  for  omnidirectional  transmission  scales  with  the 
square  of  the  carrier  frequency,  but  for  the  same  antenna  aperture,  antenna  directivity  scales 
up  with  the  square  of  the  carrier  frequency.  Thus,  given  that  generating  RF  power  at  high 
carrier  frequencies  is  difficult,  we  anticipate  that  mm  wave  communication  systems  will  employ 
antenna  directivity  at  both  ends.  Since  the  inter-element  spacing  scales  with  carrier  wavelength. 
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it  becomes  possible  to  accommodate  a  large  number  of  antenna  elements  in  a  small  area  (e.g., 
a  1000  element  antenna  array  at  60  GHz  is  palm-sized!),  and  use  electronic  beamsteering  to 
realize  pencil  beams  at  the  transmitter  and  receiver.  Of  course,  this  is  easier  said  than  done. 
Hardware  realization  of  such  large  arrays  remains  a  challenge.  On  the  algorithmic  side,  since 
building  a  separate  up  converter  or  downconverter  for  every  antenna  element  is  infeasible  as  we 
scale  up  the  array,  it  is  essential  to  devise  signal  processing  algorithms  that  do  not  assume  the 
availability  of  the  separate  complex  baseband  signals  for  each  antenna  element.  The  nature 
of  diversity  and  spatial  multiplexing  also  fundamentally  changes  at  tiny  wavelengths:  due  to 
the  directionality  of  mm  wave  links,  there  are  only  a  few  dominant  propagation  paths,  so  that 
designs  for  rich  scattering  models  no  longer  apply.  For  indoor  environments,  blockage  by  humans 
and  furniture  becomes  inevitable,  since  the  ability  of  electromagnetic  waves  to  diffract  around 
obstacles  depends  on  how  large  they  are  relative  to  the  wavelength  (i.e.,  obstacles  “look  bigger” 
at  tiny  wavelengths).  For  outdoor  environments,  performance  is  limited  by  the  oxygen  absorption 
loss  (about  16  dB/km)  in  the  60  GHz  band,  and  rain  loss  for  mm  wave  communication  in  general 
and  the  mm  wave  band  (e.g.,  as  high  as  30  dB/km  in  heavy  rain).  While  link  ranges  of  hundreds 
of  meters  can  be  achieved  with  reasonable  margins  to  account  for  these  effects,  longer  ranges 
than  these  would  be  hghting  physics,  hence  multihop  networks  become  interesting.  Of  course, 
once  we  start  forming  pencil  beams,  networking  protocols  that  rely  on  the  broadcast  nature 
of  the  wireless  medium  no  longer  apply.  These  are  just  a  few  of  the  issues  that  are  probably 
going  to  take  signihcant  research  and  development  to  iron  out,  which  bodes  well  for  aspiring 
communication  engineers. 

Figure  8.28  depicts  how  picocells  and  mm  wave  communication  might  come  together  to  address 
the  cellular  capacity  crisis.  A  large  macrocellular  base  station  provides  default  connectivity 
via  Long  Term  Evolution  (LTE),  a  fourth  generation  cellular  technology  standardized  relatively 
recently;  despite  its  name,  it  may  not  suffice  for  the  long  term  because  of  exponentially  increasing 
demand.  Picocellular  base  stations  are  deployed  opportunistically  on  lampposts,  and  may  be 
connected  via  a  mm  wave  backhaul  network.  Users  in  a  picocell  could  talk  to  the  base  stations 
using  LTE,  or  perhaps  even  mm  wave.  Users  not  covered  by  picocells  talk  to  the  macrocell  using 
LTE. 

Cooperative  communication  While  we  have  studied  communication  between  a  single  transmitter 
and  receiver  in  this  book,  it  provides  a  building  block  for  emerging  ideas  in  cooperative  communi¬ 
cation.  For  example,  neighboring  nodes  could  form  a  virtual  antenna  array,  forming  distributed 
MIMO  (DMIMO)  systems  with  signihcantly  improved  power-bandwidth  tradeoffs.  This  allows 
us,  for  example,  to  bring  the  benehts  of  MIMO  to  systems  with  low  carrier  frequencies  which 
propagate  well  over  large  distances,  but  are  not  compatible  with  centralized  antenna  arrays  be¬ 
cause  of  the  large  carrier  wavelength.  For  example,  the  wavelength  at  50  MHz  is  6  meters,  hence 
conventional  antenna  arrays  would  be  extremely  bulky,  but  neighboring  nodes  naturally  spaced 
by  tens  of  meters  could  form  a  DMIMO  array.  DMIMO  at  low  carrier  frequencies  is  a  promising 
approach  for  bridging  the  digital  divide  in  cost-effective  fashion,  providing  interference  suppres¬ 
sion  and  multiplexing  capabilities  as  in  MIMO,  along  with  link  ranges  of  tens  of  kilometers. 
Another  promising  example  of  cooperative  communication  is  interference  alignment,  in  which 
multiple  transmitters,  each  of  which  is  sending  to  a  different  receiver,  coordinate  so  as  to  ensure 
that  the  interference  they  generate  for  each  other  is  limited  in  time-frequency  space.  Of  course, 
realizing  the  benehts  of  cooperative  communication  require  fundamental  advances  in  distributed 
synchronization  and  channel  estimation,  along  with  new  network  protocols  that  support  these 
innovations.  More  good  news  for  the  next  generation  of  communication  engineers! 

Full-duplex  communication:  Most  communication  transceivers  cannot  transmit  and  receive  at 
the  same  time  on  the  same  frequency  band  (or  even  closely  spaced  bands).  This  is  because  even 
a  small  amount  of  leakage  from  the  transmit  chain  can  swamp  out  the  received  signal,  which  is 
much  weaker.  Thus,  communication  networks  typically  operate  in  time  division  duplexed  (TDD) 
mode  (also  more  loosely  termed  half  duplex  mode),  in  which  the  transmitter  and  receiver  use  the 
same  band,  but  are  not  active  at  the  same  time,  or  in  frequency  division  duplexed  (FDD)  mode. 
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Figure  8.28:  The  potential  role  of  small  cells  and  mm  wave  communication  in  future  cellular 
systems  (figure  courtesy  Dinesh  Ramasamy). 
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in  which  the  transmitter  and  receiver  may  be  simultaneously  active,  but  in  different,  typically 
widely  separated,  bands.  There  has  been  promising  progress  recently,  however,  on  relatively  low- 
cost  approaches  to  canceling  interference  from  the  transmit  chain,  seeking  to  make  full  duplex 
operation  (i.e.,  sending  and  receiving  at  the  same  time  in  the  same  band)  feasible.  If  these 
techniques  turn  out  to  be  robust  and  practical,  then  they  could  lead  to  signihcant  performance 
enhancements  in  wireless  networks.  Of  course,  networking  with  full  duplex  links  require  revisiting 
current  protocols,  which  are  based  on  either  TDD  or  FDD. 

Challenging  channels:  While  we  have  discussed  issues  related  to  significant  improvements  in 
wireless  data  rates  and  ranges  relative  to  large-scale  commercial  wireless  networks  today,  there 
are  important  applications  where  simply  forming  and  maintaining  a  viable  link  is  a  challenge. 
Examples  include  underwater  acoustic  networks  (for  sensing  and  exploration  in  oceans,  rivers 
and  lakes)  and  body  area  networks  (for  continuous  health  monitoring). 

Wireless- enabled  multi-agent  systems:  Wireless  is  at  the  heart  of  many  emerging  systems  that 
require  communication  and  coordination  between  a  variety  of  “agents”  (these  may  be  machines 
or  humans).  Examples  include  asset  tracking  and  inventory  management  using  radio  frequency 
identification  (RFID)  tags;  sensor  networks  for  automation  in  manufacturing,  environmental 
monitoring,  healthcare  and  assisted  living;  vehicular  communication;  smart  grid;  and  nascent 
concepts  such  as  autonomous  robot  swarms.  Such  “multi-agent”  systems  rely  on  wireless  to 
provide  tetherless  connectivity  among  agents,  as  well  as  to  possibly  provide  radar-style  measure¬ 
ments,  hence  characterization  and  optimization  of  the  wireless  network  in  each  specific  context 
is  essential  for  sound  system  design. 


Scaling  mostly  digital  transceivers 

As  discussed  in  Chapter  1,  a  key  technology  story  that  has  driven  the  growth  of  communication 
systems  is  Moore’s  law,  which  allows  us  to  inexpensively  implement  sophisticated  DSP  algorithms 
in  communication  transceivers.  A  modern  “mostly  digital”  receiver  typically  has  analog-to- 
digital  converters  (ADCs)  representing  each  I  and  Q  sample  with  8-12  bits  of  precision.  As 
communication  data  rates  and  bandwidths  increase,  Moore’s  law  will  probably  be  able  to  keep 
up  for  a  while  longer,  but  the  ADC  becomes  a  bottleneck  as  signal  bandwidths,  and  hence 
the  required  sampling  rates,  scale  to  GHz  and  beyond.  High-speed,  high-precision  ADCs  are 
power-hungry,  occupy  large  chip  areas  (and  are  therefore  expensive),  and  are  difficult  to  build. 
Thus,  a  major  open  question  in  communication  systems  is  whether  we  can  continue  to  enjoy  the 
economies  of  scale  provided  by  “mostly  digital”  architectures  as  communication  bandwidths  go 
up. 

We  have  already  discussed  the  potential  for  mm  wave  communication  and  its  unique  challenges. 
The  ADC  bottleneck  is  one  more  challenge  we  must  add  to  the  list,  but  this  challenge  applies 
to  any  communication  system  which  seeks  to  employ  DSP  while  scaling  up  bandwidth.  An 
important  example  is  fiber  optic  communication,  where  the  bandwidths  involved  are  huge.  For 
the  longest  time,  these  systems  have  operated  using  elementary  signaling  schemes  such  as  on-off 
keying,  with  mostly  analog  processing  at  the  receiver.  However,  researchers  are  now  seeking  to 
bring  the  sophistication  of  wireless  transceivers  to  optical  communication.  By  making  optical 
communication  more  spectrally  efficient,  we  can  increase  data  rates  on  hbers  already  buried  in  the 
ground,  simply  by  replacing  the  transceivers  at  each  end.  Sophisticated  algorithms  are  critical 
for  achieving  this,  and  these  are  best  implemented  in  DSP.  Furthermore,  by  making  optical 
transceivers  mostly  digital,  we  could  obtain  the  economies  of  scale  required  for  high-volume 
applications  such  as  very  short-range  chip-to-chip,  or  intra-chip,  communication.  Compared 
to  wireless  communication,  optical  communication  represents  special  challenges  due  to  hber 
nonlinearities  and  because  of  its  higher  bandwidth,  while  not  facing  the  difficulties  arising  from 
mobility. 
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Yet  another  area  where  we  seek  increased  speed  and  sophistication  is  wired  backplane  commu¬ 
nication  for  interconnecting  hardware  modules  on  a  circuit  board  (e.g.,  inputs  and  outputs  for  a 
high-speed  router,  or  processor  and  memory  modules  in  computer),  and  “networks  on  chip”  for 
communicating  between  modules  on  a  single  integrated  circuit  (e.g.,  for  a  “multi-core”  processor 
chip  with  multiple  processor  and  memory  modules). 

How  does  one  overcome  the  ADC  bottleneck?  We  do  not  have  answers  yet,  but  there  are  some 
natural  ideas  to  try.  One  possibility  is  to  try  and  get  by  with  fewer  bits  of  precision  per  sample. 
Severe  quantization  introduces  a  signihcant  nonlinearity,  but  it  is  possible  that  we  could  still 
extract  enough  information  for  reliable  communication  if  the  dynamic  range  of  the  analog  signal 
being  quantized  is  not  too  large.  Of  course,  the  algorithms  that  we  have  seen  in  this  textbook 
(e.g.,  for  demodulation,  linear  equalization,  MIMO  processing)  all  rely  on  the  linearity  of  the 
channel  not  being  disturbed  by  the  ADC,  an  excellent  approximation  for  high-precision  ADC. 
This  assumption  now  needs  to  be  thrown  out:  in  essence,  we  must  “redo”  DSP  for  communication 
if  we  are  going  to  live  with  low-precision  ADC  at  the  receiver.  Another  possibility  is  to  parallelize: 
we  could  implement  a  high-speed  ADC  by  running  lower  speed  ADCs  in  parallel,  or  we  could 
decompose  the  communication  signal  in  the  frequency  domain,  such  that  relatively  low-speed 
ADCs  with  high  precision  can  be  used  in  parallel  for  different  subbands.  These  are  areas  of 
active  research. 


Beyond  Moore’s  law 

Moore’s  law  has  been  working  because  the  semiconductor  industry  keeps  managing  to  shrink 
feature  sizes  on  integrated  circuits,  making  transistors  (which  are  then  used  as  building  blocks 
for  digital  logic)  tinier  and  tinier.  Many  in  the  industry  now  say  that  the  time  is  approaching 
when  shrinking  transistors  in  this  fashion  will  make  their  behavior  non-deterministic  (i.e.,  their 
output  can  have  errors,  just  like  the  output  of  a  demodulator  in  a  communication  system).  Doing 
deterministic  logic  computations  with  non-deterministic  units  is  a  serious  challenge,  which  looks 
to  be  more  difficult  than  reliable  communication  over  a  noisy  channel.  However,  it  is  an  intriguing 
question  as  to  whether  it  is  possible  to  use  ideas  similar  to  those  in  digital  communication  to 
evolve  new  paradigms  for  reliable  computation  with  unreliable  units.  This  is  a  grand  challenge 
to  which  experts  in  communication  systems  might  be  able  to  make  signihcant  contributions. 


Parting  thoughts 

The  introductory  treatment  in  this  textbook  is  intended  to  serve  as  a  gateway  to  an  exciting  future 
in  communications  research  and  technology  development.  We  hope  that  this  discussion  gives  the 
reader  the  motivation  for  further  study  in  this  area,  using,  for  example,  more  advanced  textbooks 
and  the  research  literature.  We  do  not  provide  specihc  references  for  the  topics  mentioned  in 
this  epilogue,  because  research  in  many  of  these  areas  is  evolving  too  rapidly  for  a  few  books 
or  papers  to  do  it  justice.  Of  course,  the  discussion  does  provide  plenty  of  keywords  for  online 
searches,  which  should  bring  up  interesting  material  to  follow  up  on. 


450 


Bibliography 


[1]  S.  Haykin,  Communications  Systems.  Wiley,  2000. 

[2]  J.  G.  Proakis  and  M.  Salehi,  Fundamentals  of  Communication  Systems.  Prentice  Hall,  2004. 

[3]  M.  B.  Pursley,  Introduction  to  Digital  Communications.  Prentice  Hall,  2003. 

[4]  R.  E.  Ziemer  and  W.  H.  Tranter,  Principles  of  Communication:  Systems,  Modulation  and 
Noise.  Wiley,  2001. 

[5]  J.  R.  Barry,  E.  A.  Lee,  and  D.  G.  Messerschmitt,  Digital  Communication.  Kluwer  Academic 
Publishers,  2004. 

[6]  S.  Benedetto  and  E.  Biglieri,  Principles  of  Digital  Transmission;  with  Wireless  Applications. 
Springer,  1999. 

[7]  U.  Madhow,  Fundamentals  of  Digital  Communication.  Gambridge  University  Press,  2008. 

[8]  J.  G.  Proakis  and  M.  Salehi,  Digital  Communications.  McGraw  Hill,  2007. 

[9]  J.  M.  Wozencraft  and  I.  M.  Jacobs,  Principles  of  Communication  Engineering.  Wiley,  1965. 
reissued  by  Waveland  Press  in  1990. 

[10]  A.  J.  Viterbi  and  J.  K.  Omura,  Principles  of  Digital  Communication  and  Coding.  Mc-Graw 
Hill,  1979. 

[11]  R.  E.  Blahut,  Digital  Transmission  of  Information.  Addison- Wesley,  1990. 

[12]  J.  D.  Gibson,  ed..  The  Mobile  Communications  Handbook.  CRG  Press,  2012. 

[13]  K.  Sayood,  Introduction  to  Data  Compression.  Morgan  Kaufmann,  2005. 

[14]  D.  P.  Bertsekas  and  R.  G.  Gallager,  Data  Networks.  Prentice-Hall,  1991. 

[15]  A.  Kumar,  D.  Manjunath,  and  J.  Kuri,  Communication  Networking:  An  Analytical  Ap¬ 
proach.  Morgan  Kaufmann,  2004. 

[16]  J.  Walrand  and  P.  Varaiya,  High  Performance  Communication  Networks.  Morgan- 
Kauffmann,  2000. 

[17]  A.  V.  Oppenheim,  A.  S.  Whisky,  and  S.  H.  Nawab,  Signals  and  Systems.  Prentice  Hah, 
1996. 

[18]  B.  P.  Lathi,  Linear  Systems  and  Signals.  Oxford  University  Press,  2004. 

[19]  A.  V.  Oppenheim  and  R.  W.  Schafer,  Discrete-Time  Signal  Processing.  Prentice  Hah,  2009. 

[20]  S.  K.  Mitra,  Digital  Signal  Processing:  A  Computer-based  Approach.  McGraw-Hill,  2010. 


451 


[21]  R.  D.  Yates  and  D.  J.  Goodman,  Probability  and  Stochastic  Processes:  A  Friendly  Introduc¬ 
tion  for  Electrical  and  Computer  Engineers.  Wiley,  2004. 

[22]  J.  W.  Woods  and  H.  Stark,  Probability  and  Random  Processes  with  Applications  to  Signal 
Processing.  Prentice  Hall,  2001. 

[23]  A.  Leon-Garcia,  Probability  and  Random  Processes  for  Electrical  Engineering.  Prentice  Hall, 
1993. 

[24]  A.  Papoulis  and  S.  U.  Pillai,  Probability,  Random  Variables  and  Stochastic  Processes. 
McGraw-Hill,  2002. 

[25]  J.  B.  Johnson,  “Thermal  agitation  of  electricity  in  conductors,”  Phys.  Rev.,  vol.  32,  pp.  97- 
109,  1928. 

[26]  H.  Nyquist,  “Thermal  agitation  of  electric  charge  in  conductors,”  Phys.  Rev.,  vol.  32, 
pp.  110-113,  1928. 

[27]  D.  Abbott,  B.  Davis,  N.  Phillips,  and  K.  Eshraghian,  “Simple  derivation  of  the  thermal 
noise  formula  using  window-limited  fourier  transforms  and  other  conundrums,”  Education, 
IEEE  Transactions  on,  vol.  39,  pp.  1  -13,  feb  1996. 

[28]  R.  Sarpeshkar,  T.  Delbruck,  and  G.  Mead,  “White  noise  in  mos  transistors  and  resistors,” 
Circuits  and  Devices  Magazine,  IEEE,  vol.  9,  pp.  23  -29,  nov.  1993. 

[29]  V.  A.  Kotelnikov,  The  Theory  of  Optimum  Noise  Immunity.  McGraw  Hill,  1959. 

[30]  G.  D.  Forney,  “Maximum-likelihood  sequence  estimation  of  digital  sequences  in  the  presence 
of  intersymbol  interference,”  IEEE  Trans.  Information  Theory,  vol.  18,  pp.  363-378,  1972. 

[31]  G.  Ungerboeck,  “Adaptive  maximum-likelihood  receiver  for  carrier-modulated  data- 
transmission  systems,”  IEEE  Trans.  Communications,  vol.  22,  pp.  624-636,  1974. 

[32]  S.  Kay,  Eundamentals  of  Statistical  Signal  Processing,  Volume  I:  Estimation  Theory.  Pren¬ 
tice  Hall,  1993. 

[33]  H.  V.  Poor,  An  Introduction  to  Signal  Detection  and  Estimation.  Springer,  2005. 

[34]  H.  L.  V.  Trees,  Detection,  Estimation,  and  Modulation  Theory,  Part  I.  Wiley,  2001. 

[35]  R.  E.  Blahut,  Modem  Theory:  an  Introduction  to  Telecommunications.  Gambridge  Univer¬ 
sity  Press,  2009. 

[36]  T.  M.  Gover  and  J.  A.  Thomas,  Elements  of  Information  Theory.  Wiley,  2006. 

[37]  G.  E.  Shannon,  “A  mathematical  theory  of  communication,”  Bell  Systems  Technical  Journal, 
vol.  27,  pp.  379-423,  623-656,  1948. 

[38]  R.  J.  McEliece,  The  Theory  of  Information  and  Coding.  Gambridge  University  Press,  2002. 

[39]  R.  E.  Blahut,  Algebraic  Codes  for  Data  Transmission.  Gambridge  University  Press,  2003. 

[40]  S.  Lin  and  D.  J.  Gostello,  Error  Control  Coding.  Prentice  Hall,  2004. 

[41]  T.  K.  Moon,  Error  Correction  Coding:  Mathematical  Methods  and  Algorithms.  Wiley,  2005. 

[42]  A.  Goldsmith,  Wireless  Communications.  Gambridge  University  Press,  2005. 


452 


[43]  D.  Tse  and  P.  Viswanath,  Fundamentals  of  Wireless  Communication.  Cambridge  University 
Press,  2005. 

[44]  G.  Foschini,  “Layered  space-time  architecture  for  wireless  communication  in  a  fading  e 
nvironment  when  using  multi-element  antennas,”  Bell-Labs  Technical  Journal,  vol.  1,  no.  2, 
pp.  41-59,  1996. 

[45]  E.  Telatar,  “Capacity  of  multi-antenna  Gaussian  channels,”  AT&T  Bell  Labs  Internal  Tech. 
Memo  #  BL0112170-950615-07TM,  June  1995. 

[46]  S.  Alamouti,  “A  simple  transmit  diversity  technique  for  wireless  communications,”  IEEE 
Selected  Areas  in  Communications,  vol.  16,  pp.  1451-1458,  October  1998. 

[47]  A.  Paulraj,  R.  Nabar,  and  D.  Gore,  Introduction  to  space-time  wireless  communications. 
Cambridge  University  Press,  2003. 

[48]  H.  Jafarkhani,  Space-time  coding:  theory  and  practice.  Cambridge  University  Press,  2003. 

[49]  H.  Bolcskei,  D.  Gesbert,  C.  B.  Papadias,  and  A.  J.  van  der  Veen,  eds..  Space-time  wireless 
systems:  from  array  processing  to  MIMO  communications.  Cambridge  University  Press, 
2006. 


453 


Index 


AM,  92 

envelope  detection,  93 
modulation  index,  92 
amplitude  modulation,  89 
conventional  AM,  92 
DSB-SC,  89 
QAM,  105 
SNR,  266 
SSB,  98 
VSB,  103 

analog  communication 
block  diagram,  16 
block  diagram,  16 
analog  modulation 

legacy  systems,  128 
SNR,  266 

angle  modulation,  107 
FM,  107 
PM,  107 
SNR,  269 

antipodal  signaling,  305 
AWGN  channel 

optimal  reception,  295 

bandwidth,  61,  147 

fractional  power  containment,  147 
Bandwidth  efficiency 

orthogonal  modulation,  163 
bandwidth  efficiency 

linear  modulation,  155 
baseband  channel,  61 
baseband  signal,  61 
Bayes’  rule,  189 
belief  propagation,  370 
check  node  update,  373 
software  lab,  382 
variable  node  update,  372 
binary  symmetric  channel,  358 
capacity,  360 

biorthogonal  modulation,  163 

bit  interleaved  coded  modulation,  352 

capacity 

bandlimited  AWGN  channel,  356 
binary  symmetric  channel,  360 


discrete-time  AWGN  channel,  356 
GDF,  192 
joint,  196 
cellular  networks 
introduction,  21 
picocells,  445 
central  limit  theorem,  261 
channel  code 

bounded  distance  decoding,  366 
linear,  361 

minimum  distance,  365 
channel  decoder 
role  of,  19 
channel  encoder 
role  of,  18 

coherence  bandwidth,  56 
complementary  GDF,  193 
Gomplex  baseband  representation 
frequency  domain  relationship,  68 
complex  baseband  representation,  63 
filtering,  73 

role  in  transceiver  implementation,  75 
wireless  channel  modeling,  76 
complex  envelope,  66 
complex  exponential,  31 
complex  numbers,  27 
conditional  error  probabilities,  279 
constellations,  144 
convolution,  39 

discrete  time,  43 
correlator 

for  optimal  reception,  297 
Govariance 
matrix,  217 
properties,  217 
covariance,  210 

cumulative  distribution  function,  see  GDF 

dBm,  236 
decoding 

belief  propagation,  370 
bit  flipping,  368 
bounded  distance,  366 
delay  spread,  56 
delta  function,  32 


454 


demodulator 
role  of,  19 
density,  192 

conditional,  198 
joint,  196,  197 
digital  communication 
advantages,  20 
block  diagram,  17 
discrete  memoryless  channel,  354 
dispersive  channel 

software  model,  391 
double  sideband 
see  DSB,  89 
downconversion,  64 
DSB,  89 

need  for  coherent  demodulation,  91 

energy,  34 
energy  per  bit  (Ef,) 

binary  signaling,  304 
energy  spectral  density,  60 
Equalization,  387 
Euler’s  formula,  28 

fading 

frequency-selective,  56 
EM 

Carson’s  rule,  112 
frequency  deviation,  109 
limiter-discriminator  demodulation,  109 
modulation  index,  109 
noise  analysis,  270 
PLL-based  demodulator,  121 
preemphasis  and  deemphasis,  273 
spectrum.  111 
threshold  effect,  273 
Fourier  series,  46 
properties,  49 
Fourier  transform,  51 

DFT-based  computation,  57 
numerical  computation,  57 
properties,  53 

frequency  modulation,  see  EM 
frequency  shift  keying,  see  FSK 
Friis  formula,  324 
FSK,  160 

gamma  function,  208 
Gaussian  random  process,  234 
Gaussian  random  vector,  218 
Gram-Schmidt  orthogonalization,  290 
Gray  code,  321 

Hamming  distance,  365 


Hamming  weight,  365 
Hilbert  transform,  100 
hypothesis  testing,  278 

I  and  Q  channels 

orthogonality  of,  65 
impulse,  32 
impulse  response,  39 
indicator  function,  33 
inner  product,  33 
Internet 

introduction,  21 

Intersymbol  interference,  see  ISI 
ISI,  387 

eye  diagram,  392 
vector  model,  401 

Jacobian  matrix,  204 
joint  distribution,  196 

law  of  large  numbers,  260 
least  squares,  397 
linear  code,  361 
coset,  366 

generator  matrix,  362 
parity  check  matrix,  363 
standard  array,  366 
syndrome,  366 
Tanner  graph,  368 
linear  equalization,  394 
adaptive,  397 

geometric  interpretation,  399 
noise  enhancement,  403 
zero-forcing,  402 
linear  modulation 

bandwidth  efficiency,  155 
power  spectral  density,  146,  178 
software  lab,  174,  437 
software  model,  388 

linear  time-invariant  system,  see  LTI  system 
link  budget  analysis,  322 
examples,  325 
Link  margin,  325 

lowpass  equivalent  representation,  see  complex 
baseband  representation 
LTI  system,  36 

complex  exponential  through,  43 
eigenfunctions,  43 
impulse  response,  39 
transfer  function,  55 

marginal  distribution,  197 
matched  hlter,  245 

for  optimal  reception,  297 


455 


SNR  maximization,  245 
maximum  likelihood  (ML) 

geometry  of  decision  rule,  298 
millimeter  wave,  446 
MIMO,  387,  413 

Alamouti  space-time  code,  424 
beamsteering,  415 
distributed,  447 
diversity,  421 
linear  array,  413 
OFDM,  420 
receive  diversity,  422 
rich  scattering  model,  421 
SDMA,  418 
software  lab,  442 
spatial  multiplexing,  425 
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stationary,  228 

wide  sense  stationary  (WSS),  228 
random  variables,  191 

affine  transformation,  217 
Bernoulli,  191 
binomial,  195 
covariance,  210 
expectation,  206 
exponential,  193 
functions  of,  202 
Gaussian,  194,  208,  211 
i.i.d.,  201 
independent,  201 
joint  Gaussianity,  218 
mean,  206 
moments,  207 
multiple,  195 
Poisson,  195 
standard  Gaussian,  211 
uncorrelated,  210 
variance,  207 
random  vector,  195 
Gaussian,  218 
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